Data Cleaning & Preparation API

Revenue is lost when harvested data is wasted due to cleaning and preparation bottlenecks.

60 to 80% of analytics time is spent preparing data where there has been a 23% average annual increase in harvested data since 2021.

A solution to these problems is use of high-velocity software that will increase revenue by minimizing cleaning and preparation time.

A 50% or greater time reduction to clean and prepare data enables additional harvested data to be processed in future.

Intensity greatly reduces time needed to clean and prepare data

Speed: Client-developed data wrangling software calls the Intensity API to wrangle data in real time
Coverage: Our C/C++ API can be used by popular data wranging languages such as Python, R, Julia, Java, Swift, C and C++
Simple: Easy to create and test Intensity scripts are used to wrangle source data into target data
Reliable: Intensity quickly processes data, providing reliability and privacy as enablers of AI adoption

The data wrangling market was valued in 2025 at $3.48 billion/year, and is expected to rise to $5.93B/year by 2030.

There is a current volume of 163 zettabytes of unstructured data where there are roughly 90,000 big data companies worldwide.

Comparison of Intensity to Competition

Tool	Best For	Starting Price	Implementation Time	Code Required	References
Intensity	Data scientists, programmers	$5,195/user/year	Same day	Yes	API Alpha /** * \brief Load source and processing script, write target, deallocate source and script memory, caller has target file. * * Buffers pSourcePath & pProcessingScript, calls Delta, writes pTargetPath, * deallocates pSourceBuffer & pProcessingScriptBuffer, does not return pSourceBuffer. * * \param pSourcePath * \param pTargetPath * \param pProcessingScript * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer * \return FAILURE_OKAY on success else various FAILURE_* on error / int32_t Alpha( char pSourcePath, char pTargetPath, char pProcessingScript, uint32_t pSizeSourceBuffer ); Bravo /* * \brief Load processing script, write target, deallocate script memory, caller has target file and deallocates pSourceBuffer. * * Buffers pProcessingScript, calls Delta, writes pTargetPath, * deallocates pProcessingScriptBuffer, does not return pSourceBuffer. * * \param pSourceBuffer * \param pTargetPath * \param pProcessingScript * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer * \return FAILURE_OKAY on success else various FAILURE_* on error / int32_t Bravo( uint8_t pSourceBuffer, char pTargetPath, char pProcessingScript, uint32_t pSizeSourceBuffer ); Charlie /* * \brief Load processing script, deallocate script, returns target (pSourceBuffer), caller writes target * and deallocates pSourceBuffer. * * Buffers pProcessingScript, calls Delta, deallocates pProcessingScriptBuffer, returns altered pSourceBuffer. * * \param pSourceBuffer * \param pProcessingScript * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer * \return FAILURE_OKAY on success else various FAILURE_* on error / int32_t Charlie( uint8_t pSourceBuffer, char pProcessingScript, uint32_t pSizeSourceBuffer ); Delta /** * \brief Returns target (pSourceBuffer), caller has target (pSourceBuffer) and deallocates pSourceBuffer * and pProcessingScriptBuffer. * * Delta is the class called by all others. * * \param pSourceBuffer * \param pProcessingScriptBuffer * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer * \param pMaxSourceBuffer: size of pSourceBuffer when malloc'd * \return FAILURE_OKAY on success else various FAILURE_* on error / int32_t Delta( uint8_t pSourceBuffer, uint8_t pProcessingScriptBuffer, uint32_t pSizeSourceBuffer, uint64_t pMaxSourceBuffer ); Echo /** * \brief processing script is populated, load source, write target, deallocate source memory, caller has target file. * * Buffers pSourcePath & pProcessingScript, calls Delta, writes pTargetPath, * deallocates pSourceBuffer & pProcessingScriptBuffer, does not return pSourceBuffer. * * \param pSourcePath * \param pTargetPath * \param pProcessingScriptBuffer * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer * \return FAILURE_OKAY on success else various FAILURE_* on error */ int32_t Echo( char pSourcePath, char pTargetPath, uint8_t pProcessingScriptBuffer, uint32_t pSizeSourceBuffer ); Scripting Beautify BeautifyXML [1 of 2] Format code, indented with tabs. Syntax:* BeautifyXML BeautifyXML [2 of 2] Format code, indented with tabs and then remove the space between <End> and </End> and <A> and </A> tags. Syntax:* BeautifyXML\|FIX_END\| Change ChangeTag Replace chosen field (1 is tag1 else tag2) with replacement after locating tag1 then tag2 in sequence. Syntax: ChangeTag\|1 for tag1\|tag1\|tag2\|replacement ChangeWrappedString Find signature, scan for opening and closing double quotes replace text between quotes with replacement. Syntax: ChangeWrappedString\|signature\|replacement Clean CleanXML Remove tabs, carriage returns, line feeds and hidden chars when writing final output. Note that StripXML has a higher priority than FormatXML in that StripXML will be used even if FormatXML and StripXML are declared in this file. Syntax: CleanXML Conceal ConcealBlankTags Hide passed tag sequence if there is nothing between the tags. Syntax: ConcealBlankTags\|tag_open\|tag_close\| ConcealSpecialTags Hide tags and the data between in a way that finds tags that contain carriage returns/line feeds between elements. Syntax: ConcealSpecialTags\|tag_open\|tag_contains\|tag_close\|insert_to_start_of_buffer Confirm ConfirmField Validates a field to ensure it exists, is set as a name => value pair in PHP array format, with leading and trailing characters verified to ensure it is correct format to be loaded and processed within another PHP script. Syntax: ConfirmField\|String\| Correct CorrectQP Repair quoteable-printable 7-bit email encoding. Syntax: CorrectQP Eliminate EliminateBinary Force deletion of the passed binary data where the second string of three is one, two or three 8-bit binary values represented in hexadecimal format. Each hex value must take the form of ‘0xFF’ where ‘0x’ is the hex prefix and ‘FF’ can be any hex value from ‘00’ to ‘FF’. Up to three hex values can be passed in the format of ‘0xFF0xFF0xFF’. Syntax: EliminateBinary\|tag_open\|hex_binary\|tag_close\| EliminateBytes Delete all occurrences of 8-bit binary values represented in hexadecimal format. Each hex value must take the form of ‘XX’ where ‘XX’ is any hex value from ‘00’ to ‘FF’. Syntax: EliminateBytes\|pairs_of_hex_values\| EliminateContent Delete data that starts with from and ends with to, where to is retained if 0 is passed Syntax: EliminateContent\|from\|to\|0\| EliminateContentAll Delete all data that starts with from and ends with to, where to is retained if 0 is passed. Syntax: EliminateContentAll\|from\|to\|0 EliminateField Delete FieldNumber (range 0-many) after Begin and before End by counting occurrences of FieldDelimiter. Syntax: EliminateField\|Begin\|End\|FieldNumber\|FieldDelimiter\| EliminateFirstLine Eliminate first line if it matches ToFind Syntax: EliminateFirstLine\|ToFind\| EliminateFirstToEnd Locate Open, scan forward to First, eliminate until first occurence of End. Syntax: EliminateFirstToEnd\|Open\|First\|End\| EliminateForward Eliminate all data that starts with Begin, starts with Next and ends with a line feed (0x0A). Syntax: EliminateForward\|Begin\|Next\| EliminateFromTo Eliminate all occurrences of Begin to End where End must occur after Begin. Syntax: EliminateFromTo\|Begin\|Next\| EliminateLFs Removes line terminators. Syntax: EliminateLFs EliminateLines Eliminate every Line-Feed (0x0A) terminated line that contains Begin. Syntax: EliminateLines\|Begin\| EliminateOnLine Eliminate all data on the same line that starts with Begin, ends with End. Syntax: EliminateOnLine\|Begin\|End\| EliminatePattern Eliminate all occurrences of Pattern with a trailing Signature. Syntax: EliminatePattern\|Pattern\|Signature\| EliminateSpan Eliminate all data that starts with Begin and ends with End. Syntax: EliminateSpan\|Begin\|End\| EliminateString Delete all occurrences of passed string from buffer. Syntax: EliminateString\|string_to_delete\| EliminateTag Delete data that starts with tag_open and, if not passed, ends with '>'. Syntax: EliminateTag\|tag_open\| EliminateTag2 Locate exact match to tag_open, scan for exact match to tag_next_to_delete, and then delete tag_next_to_delete. Note that this is potentially dangerous in that tag_open and tag_next_to_delete could be separated in context and result in invalid data deletion. It also has limited use in that it could leave a mess behind of deleted opening tags with left over closing tags. Syntax: EliminateTag2\|tag_open\|tag_next_to_delete\| Extract ExtractText Determine if source file is a supported document type. Currently: Microsoft Office Word .docx Excel .xlsx PowerPoint .pptx If it is a supported documents then the text is extracted and provided for use by additional commands in the processing script. Syntax: ExtractText Preserve PreserveMemory Preserve file memory buffer to a file. This command is useful when creating a new processing script because it can write the file buffer at any stage of processing. Syntax: PreserveMemory Provisional ProvisionalUpdate If tag_find not found then insert it before tag_after. Syntax: ProvisionalUpdate\|tag_find\|tag_after\| Put PutBetweenTags Locate exact match to tag_open scan for exact match to tag_next, and then insert tag_to_insert_between_open_and_next between tag_open and tag_next. Syntax: PutBetweenTags\|tag_open\|tag_next\|tag_to_insert_between_open_and_next\| PutBinaryPostfix Append passed string with Adobe InDesign-specific binary line feed data. See the notes in InsertBinaryPrefix. Syntax: PutBinaryPostfix\|tag\| PutBinaryPrefix Prepend passed string with Adobe InDesign-specific binary line feed data. Note that the binary data is embedded to force InDesign to drop lines feeds after tag closure. This is only needed when the InDesign tag formatting does not specifically call for a line feed to be dropped after a tag closes. It would be best to avoid using InsertBinaryPrefix and InsertBinaryPostfix by handling all line feeds through tag formatting within InDesign. Syntax: PutBinaryPrefix\|tag\| PutField Put Insert at FieldNumber (range 0-n) after Begin and before End counting occurrences of FieldDelimiter. Syntax: PutField\|Begin\|End\|Insert\|FieldNumber\|FieldDelimiter\| PutPostfix Insert a string at the end of the file memory buffer. Syntax: PutPostfix\|String\| PutPostfixLine Put Add before each Line Feed (0x0A). Syntax: PutPostfixLine\|Add\| PutPrefix Insert a string at the start of the file memory buffer. Syntax: PutPrefix\|String\| PutPrefixField Prefix FieldNumber (range 0-n) with Prefix that starts with DelimiterBegin and ends with DelimiterEnd, replacing delimiters with ReplaceBegin and ReplaceEnd on each line. Syntax: PutPrefixField\|Prefix\|DelimiterBegin\|DelimiterEnd\|ReplaceBegin\|ReplaceEnd\|LineMax\| PutPrefixLine Prefix each line in the buffer with a concatination of Prefix + Delimiter. Syntax: PutPrefixLine\|Prefix\|Delimiter\| PutString Put a concatination of Field + Delimiter + Filename + Delimiter before each instance of Tag. Syntax: PutString\|Tag\|Field\|Delimiter\|Filename\| Reduce ReduceLineTerminators Reduce all extraneous line feeds. Syntax: ReduceLineTerminators ReduceSpaces Reduce all extraneous spaces. Syntax: ReduceSpaces Remove RemoveBetween Remove data between START and END. Syntax: RemoveBetween\|Start\|End\| RemoveWithout If Find is not found in the memory buffer then replace all memory buffer content with Replace. Syntax: RemoveWithout\|Find\|Replace\| RemoveWrapper Purge <tag_open> and </tag_open> if located on tag_level and followed by tag_after at tag_level+1. Syntax: RemoveWrapper\|tag_level\|tag_open\|tag_after\| Set SetClosingTag Locates tag_open and tag_close when they are both positioned at the same tag level, and then replaces tag_close with new_tag_close. Syntax: SetClosingTag\|tag_open\|tag_close\|new_tag_close\| SetFieldDelimiter Set field delimiter within curly brackets Syntax: SetFieldDelimiter{delchar} Swap SwapAtNestedLevel Substitute tag_open located at tag_level with new_tag_open and then replaces matching closing TAG with new_tag_close. Note that if before is passed then it has to exist before tag_open for the changes to be made. Syntax: SwapAtNestedLevel\|tag_level\|before\|tag_open\|new_tag_open\|new_tag_close\| SwapNested Complex search and substitution for data nested from two to three tag levels. Syntax: SwapNested\|sig_tag_root\|sig_nested_1_tag\|sig_nested_2_tag\|sig_tag_close\|replace_open\|replace_close\| SwapNext Change first occurrences of from to to from start of file buffer Syntax: SwapNext\|from\|to\| SwapOutward Search for primary opening and closing tags. If found, search backward and forward for secondary tags. If found, perform substitution. Why? Because some tags are so generic that SwapNested fails. Syntax: SwapOutward\|tag_open\|tag_close\|previous_tag_open\|previous_tag_close\|replace_open\|replace_close\|0=Do not extract text, 1=Extract Test\| SwapStrings Change all occurrences of from to to. Syntax: SwapStrings\|from\|to\| SwapTags Swap sig_tag_open and sig_tag_close with replace_open and replace_close, keeping the data between. Syntax: SwapTags\|sig_tag_open\|sig_tag_close\|replace_open\|replace_close\| Transfer TransferBlock Locate <tag_open> and </tag_open> located at tag_level_from. If before is populated then determine if it precedes <tag_open> one level before. Do not make any changes if it does not. Extract the data between <tag_open> and </tag_open>, hide <tag_open>data</tag_open>, move down till Tag Level == tag_level_to then start a new block using the passed parameters making sure to include the extracted data. Note : tag_open does not have to have a leading '<'. Syntax: TransferBlock\|tag_level_from\|tag_level_to\|before\|tag_open\|replace_open\|replace_close\| Transform TransformLFs Change Linefeed (0x0A) and Carriage Return (0x0D) ASCII values to "^LF^" and "^CR^" Syntax: TransformLFs
Alteryx Designer	Complex workflows, analytics	$5,195/user/year start Hidden Fees: Most users pay: $10,000 to $20,000/year	2-4 weeks	Minimal	General In Depth
Alteryx Designer Cloud Was Trifacta	Visual data prep, AI suggestions	$10,000+/year	2-4 weeks	No	General
Apache Spark	Big data processing	Free (open source)	3-6 weeks	Yes	General
Dataiku	Enterprise ML workflows	$50,000+/year Starts ~$48,000/year Enterprise plans well into six figures	4-8 weeks	Minimal	General In Depth
Datameer	Cloud data platforms	$25,000+/year	2-4 weeks	No	General
Informatica Data Quality	Enterprise, compliance	$200,000+/year Small implementations (2-5 users): $80,000-$150,000/year Mid-size deployments (10-20 users): $200,000-$500,000/year Enterprise licenses (50+ users): $750,000-$2,000,000+/year	3-6 months	Minimal	General In Depth
KNIME	Data science workflows	KNIME Analytics Platform: Free (open source) KNIME Business Hub ~$1,188 to 3,588/year KNIME Server Custom quote	2-3 weeks	Minimal	General In Depth
Mammoth Analytics	Business analysts, no-code wrangling	$16/month	1-3 days	No	General
Microsoft Power Query	Excel/Power BI users	Included with Office	1 week	Minimal	General
OpenRefine	Small datasets, budget-conscious	Free (open source)	Same day	No	General
Python pandas	Data scientists, programmers	Free (open source) Use of the API is complex and highly granular where Python code must be developed to utilize several subpackages.	1-2 weeks	Yes	General API
R tidyverse	Statisticians, researchers	Free (open source)	1-2 weeks	Yes	General Packages
SQL (various platforms)	Database-heavy workflows	Varies	Varies	Yes	General
Tableau Prep	Tableau users, visual flows	$900/user/year Hidden Fees: Real Cost: 5-user team pays $54,000/yr $504/user/year Tableau Explorer $180/user/year Tableau Viewer A mid-sized analytics team pays: 5 Creators: $54,000/year 10 Explorers: $5,040/year 25 Viewers: $4,500/year Total: $63,540/year	1-2 weeks	No	In Depth
Talend Data Fabric	Mid-market, integration needs	$50,000-200,000+/year Open Studio: $0 Cloud Starter: $12,000-30,000 Cloud Premium: 50,000-100,000 Data Fabric Enterprise: $150,000-500,000+ Professional services: $50,000-200,000 for complex implementations Training and certification: $5,000-15,000 per developer Infrastructure costs: Cloud computing resources for data processing Maintenance overhead: Dedicated ETL developers and administrators Integration complexity: Custom connector development for unique systems	4-6 weeks	Minimal	General In Depth

Intensity has been used in various evolutionary stages to automate transformation of:

Extremely messy pseudo-XML data into clean, validated XML data.
Email real estate leads into delimited field data imported into a relational database.
Microsoft Office documents into text, then words indirectly imported into a relational database through a microservice.
Linux server logs into PCI DSS audit logs.
Large exported HTML documents transformed into into delimited files, which were later imported into a spread sheet.

Reach out for further information

Richard Evers, CEO/Founder, revers@midnightblue.ca
Waterloo, ON, Canada