Data Cleaning & Preparation API

Revenue is lost when harvested data is wasted due to cleaning and preparation bottlenecks.

60 to 80% of analytics time is spent preparing data where there has been a 23% average annual increase in harvested data since 2021.

A solution to these problems is use of high-velocity software that will increase revenue by minimizing cleaning and preparation time.

A 50% or greater time reduction to clean and prepare data enables additional harvested data to be processed in future.

Intensity greatly reduces time needed to clean and prepare data

The data wrangling market was valued in 2025 at $3.48 billion/year, and is expected to rise to $5.93B/year by 2030.

There is a current volume of 163 zettabytes of unstructured data where there are roughly 90,000 big data companies worldwide.

Comparison of Intensity to Competition

Tool Best For Starting Price Implementation Time Code Required References    
Intensity Data scientists, programmers $5,195/user/year Same day Yes API
    Alpha
    /**
     * \brief Load source and processing script, write target, deallocate source and script memory, caller has target file.
     *
     * Buffers pSourcePath & pProcessingScript, calls Delta, writes pTargetPath,
     * deallocates pSourceBuffer & pProcessingScriptBuffer, does not return pSourceBuffer.
     *
     * \param pSourcePath
     * \param pTargetPath
     * \param pProcessingScript
     * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer
     * \return FAILURE_OKAY on success else various FAILURE_* on error
     */
    int32_t Alpha( char *pSourcePath, char *pTargetPath, char *pProcessingScript, uint32_t *pSizeSourceBuffer );
                                        
    Bravo
    /**
     * \brief Load processing script, write target, deallocate script memory, caller has target file and deallocates pSourceBuffer.
     *
     * Buffers pProcessingScript, calls Delta, writes pTargetPath,
     * deallocates pProcessingScriptBuffer, does not return pSourceBuffer.
     *
     * \param pSourceBuffer
     * \param pTargetPath
     * \param pProcessingScript
     * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer
     * \return FAILURE_OKAY on success else various FAILURE_* on error
     */
    int32_t Bravo( uint8_t *pSourceBuffer, char *pTargetPath, char *pProcessingScript, uint32_t *pSizeSourceBuffer );
                                        
    Charlie
    /**
     * \brief Load processing script, deallocate script, returns target (pSourceBuffer), caller writes target
     *        and deallocates pSourceBuffer.
     *
     * Buffers pProcessingScript, calls Delta, deallocates pProcessingScriptBuffer, returns altered pSourceBuffer.
     *
     * \param pSourceBuffer
     * \param pProcessingScript
     * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer
     * \return FAILURE_OKAY on success else various FAILURE_* on error
     */
    int32_t Charlie( uint8_t *pSourceBuffer, char *pProcessingScript, uint32_t *pSizeSourceBuffer );
                                        
    Delta
    /**
     * \brief Returns target (pSourceBuffer), caller has target (pSourceBuffer) and deallocates pSourceBuffer
     *        and pProcessingScriptBuffer.
     *
     * Delta is the class called by all others.
     *
     * \param pSourceBuffer
     * \param pProcessingScriptBuffer
     * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer
     * \param pMaxSourceBuffer: size of pSourceBuffer when malloc'd
     * \return FAILURE_OKAY on success else various FAILURE_* on error
     */
    int32_t Delta( uint8_t *pSourceBuffer, uint8_t *pProcessingScriptBuffer, uint32_t *pSizeSourceBuffer, uint64_t pMaxSourceBuffer );
                                        
    Echo
    /**
     * \brief processing script is populated, load source, write target, deallocate source memory, caller has target file.
     *
     * Buffers pSourcePath & pProcessingScript, calls Delta, writes pTargetPath,
     * deallocates pSourceBuffer & pProcessingScriptBuffer, does not return pSourceBuffer.
     *
     * \param pSourcePath
     * \param pTargetPath
     * \param pProcessingScriptBuffer
     * \param pSizeSourceBuffer pointer to the size of the pSourceBuffer
     * \return FAILURE_OKAY on success else various FAILURE_* on error
     **/
    int32_t Echo( char *pSourcePath, char *pTargetPath, uint8_t *pProcessingScriptBuffer, uint32_t *pSizeSourceBuffer );
                                        
Scripting
  1. Beautify
    • BeautifyXML [1 of 2]
      • Format code, indented with tabs.
      • Syntax: BeautifyXML
    • BeautifyXML [2 of 2]
      • Format code, indented with tabs and then remove the space between <End> and </End> and <A*> and </A> tags.
      • Syntax: BeautifyXML|FIX_END|
  2. Change
    • ChangeTag
      • Replace chosen field (1 is tag1 else tag2) with replacement after locating tag1 then tag2 in sequence.
      • Syntax: ChangeTag|1 for tag1|tag1|tag2|replacement
    • ChangeWrappedString
      • Find signature, scan for opening and closing double quotes replace text between quotes with replacement.
      • Syntax: ChangeWrappedString|signature|replacement
  3. Clean
    • CleanXML
      • Remove tabs, carriage returns, line feeds and hidden chars when writing final output. Note that StripXML has a higher priority than FormatXML in that StripXML will be used even if FormatXML and StripXML are declared in this file.
      • Syntax: CleanXML
  4. Conceal
    • ConcealBlankTags
      • Hide passed tag sequence if there is nothing between the tags.
      • Syntax: ConcealBlankTags|tag_open|tag_close|
    • ConcealSpecialTags
      • Hide tags and the data between in a way that finds tags that contain carriage returns/line feeds between elements.
      • Syntax: ConcealSpecialTags|tag_open|tag_contains|tag_close|insert_to_start_of_buffer
  5. Confirm
    • ConfirmField
      • Validates a field to ensure it exists, is set as a name => value pair in PHP array format, with leading and trailing characters verified to ensure it is correct format to be loaded and processed within another PHP script.
      • Syntax: ConfirmField|String|
  6. Correct
    • CorrectQP
      • Repair quoteable-printable 7-bit email encoding.
      • Syntax: CorrectQP
  7. Eliminate
    • EliminateBinary
      • Force deletion of the passed binary data where the second string of three is one, two or three 8-bit binary values represented in hexadecimal format. Each hex value must take the form of ‘0xFF’ where ‘0x’ is the hex prefix and ‘FF’ can be any hex value from ‘00’ to ‘FF’. Up to three hex values can be passed in the format of ‘0xFF0xFF0xFF’.
      • Syntax: EliminateBinary|tag_open|hex_binary|tag_close|
    • EliminateBytes
      • Delete all occurrences of 8-bit binary values represented in hexadecimal format. Each hex value must take the form of ‘XX’ where ‘XX’ is any hex value from ‘00’ to ‘FF’.
      • Syntax: EliminateBytes|pairs_of_hex_values|
    • EliminateContent
      • Delete data that starts with from and ends with to, where to is retained if 0 is passed
      • Syntax: EliminateContent|from|to|0|
    • EliminateContentAll
      • Delete all data that starts with from and ends with to, where to is retained if 0 is passed.
      • Syntax: EliminateContentAll|from|to|0
    • EliminateField
      • Delete FieldNumber (range 0-many) after Begin and before End by counting occurrences of FieldDelimiter.
      • Syntax: EliminateField|Begin|End|FieldNumber|FieldDelimiter|
    • EliminateFirstLine
      • Eliminate first line if it matches ToFind
      • Syntax: EliminateFirstLine|ToFind|
    • EliminateFirstToEnd
      • Locate Open, scan forward to First, eliminate until first occurence of End.
      • Syntax: EliminateFirstToEnd|Open|First|End|
    • EliminateForward
      • Eliminate all data that starts with Begin, starts with Next and ends with a line feed (0x0A).
      • Syntax: EliminateForward|Begin|Next|
    • EliminateFromTo
      • Eliminate all occurrences of Begin to End where End must occur after Begin.
      • Syntax: EliminateFromTo|Begin|Next|
    • EliminateLFs
      • Removes line terminators.
      • Syntax: EliminateLFs
    • EliminateLines
      • Eliminate every Line-Feed (0x0A) terminated line that contains Begin.
      • Syntax: EliminateLines|Begin|
    • EliminateOnLine
      • Eliminate all data on the same line that starts with Begin, ends with End.
      • Syntax: EliminateOnLine|Begin|End|
    • EliminatePattern
      • Eliminate all occurrences of Pattern with a trailing Signature.
      • Syntax: EliminatePattern|Pattern|Signature|
    • EliminateSpan
      • Eliminate all data that starts with Begin and ends with End.
      • Syntax: EliminateSpan|Begin|End|
    • EliminateString
      • Delete all occurrences of passed string from buffer.
      • Syntax: EliminateString|string_to_delete|
    • EliminateTag
      • Delete data that starts with tag_open and, if not passed, ends with '>'.
      • Syntax: EliminateTag|tag_open|
    • EliminateTag2
      • Locate exact match to tag_open, scan for exact match to tag_next_to_delete, and then delete tag_next_to_delete. Note that this is potentially dangerous in that tag_open and tag_next_to_delete could be separated in context and result in invalid data deletion. It also has limited use in that it could leave a mess behind of deleted opening tags with left over closing tags.
      • Syntax: EliminateTag2|tag_open|tag_next_to_delete|
  8. Extract
    • ExtractText
      • Determine if source file is a supported document type.
        • Currently:
          Microsoft Office Word .docx
          Excel .xlsx
          PowerPoint .pptx
      • If it is a supported documents then the text is extracted and provided for use by additional commands in the processing script.
      • Syntax: ExtractText
  9. Preserve
    • PreserveMemory
      • Preserve file memory buffer to a file. This command is useful when creating a new processing script because it can write the file buffer at any stage of processing.
      • Syntax: PreserveMemory
  10. Provisional
    • ProvisionalUpdate
      • If tag_find not found then insert it before tag_after.
      • Syntax: ProvisionalUpdate|tag_find|tag_after|
  11. Put
    • PutBetweenTags
      • Locate exact match to tag_open scan for exact match to tag_next, and then insert tag_to_insert_between_open_and_next between tag_open and tag_next.
      • Syntax: PutBetweenTags|tag_open|tag_next|tag_to_insert_between_open_and_next|
    • PutBinaryPostfix
      • Append passed string with Adobe InDesign-specific binary line feed data. See the notes in InsertBinaryPrefix.
      • Syntax: PutBinaryPostfix|tag|
    • PutBinaryPrefix
      • Prepend passed string with Adobe InDesign-specific binary line feed data. Note that the binary data is embedded to force InDesign to drop lines feeds after tag closure. This is only needed when the InDesign tag formatting does not specifically call for a line feed to be dropped after a tag closes. It would be best to avoid using InsertBinaryPrefix and InsertBinaryPostfix by handling all line feeds through tag formatting within InDesign.
      • Syntax: PutBinaryPrefix|tag|
    • PutField
      • Put Insert at FieldNumber (range 0-n) after Begin and before End counting occurrences of FieldDelimiter.
      • Syntax: PutField|Begin|End|Insert|FieldNumber|FieldDelimiter|
    • PutPostfix
      • Insert a string at the end of the file memory buffer.
      • Syntax: PutPostfix|String|
    • PutPostfixLine
      • Put Add before each Line Feed (0x0A).
      • Syntax: PutPostfixLine|Add|
    • PutPrefix
      • Insert a string at the start of the file memory buffer.
      • Syntax: PutPrefix|String|
    • PutPrefixField
      • Prefix FieldNumber (range 0-n) with Prefix that starts with DelimiterBegin and ends with DelimiterEnd, replacing delimiters with ReplaceBegin and ReplaceEnd on each line.
      • Syntax: PutPrefixField|Prefix|DelimiterBegin|DelimiterEnd|ReplaceBegin|ReplaceEnd|LineMax|
    • PutPrefixLine
      • Prefix each line in the buffer with a concatination of Prefix + Delimiter.
      • Syntax: PutPrefixLine|Prefix|Delimiter|
    • PutString
      • Put a concatination of Field + Delimiter + Filename + Delimiter before each instance of Tag.
      • Syntax: PutString|Tag|Field|Delimiter|Filename|
  12. Reduce
    • ReduceLineTerminators
      • Reduce all extraneous line feeds.
      • Syntax: ReduceLineTerminators
    • ReduceSpaces
      • Reduce all extraneous spaces.
      • Syntax: ReduceSpaces
  13. Remove
    • RemoveBetween
      • Remove data between START and END.
      • Syntax: RemoveBetween|Start|End|
    • RemoveWithout
      • If Find is not found in the memory buffer then replace all memory buffer content with Replace.
      • Syntax: RemoveWithout|Find|Replace|
    • RemoveWrapper
      • Purge <tag_open> and </tag_open> if located on tag_level and followed by tag_after at tag_level+1.
      • Syntax: RemoveWrapper|tag_level|tag_open|tag_after|
  14. Set
    • SetClosingTag
      • Locates tag_open and tag_close when they are both positioned at the same tag level, and then replaces tag_close with new_tag_close.
      • Syntax: SetClosingTag|tag_open|tag_close|new_tag_close|
    • SetFieldDelimiter
      • Set field delimiter within curly brackets
      • Syntax: SetFieldDelimiter{delchar}
  15. Swap
    • SwapAtNestedLevel
      • Substitute tag_open located at tag_level with new_tag_open and then replaces matching closing TAG with new_tag_close. Note that if before is passed then it has to exist before tag_open for the changes to be made.
      • Syntax: SwapAtNestedLevel|tag_level|before|tag_open|new_tag_open|new_tag_close|
    • SwapNested
      • Complex search and substitution for data nested from two to three tag levels.
      • Syntax: SwapNested|sig_tag_root|sig_nested_1_tag|sig_nested_2_tag|sig_tag_close|replace_open|replace_close|
    • SwapNext
      • Change first occurrences of from to to from start of file buffer
      • Syntax: SwapNext|from|to|
    • SwapOutward
      • Search for primary opening and closing tags. If found, search backward and forward for secondary tags. If found, perform substitution. Why? Because some tags are so generic that SwapNested fails.
      • Syntax: SwapOutward|tag_open|tag_close|previous_tag_open|previous_tag_close|replace_open|replace_close|0=Do not extract text, 1=Extract Test|
    • SwapStrings
      • Change all occurrences of from to to.
      • Syntax: SwapStrings|from|to|
    • SwapTags
      • Swap sig_tag_open and sig_tag_close with replace_open and replace_close, keeping the data between.
      • Syntax: SwapTags|sig_tag_open|sig_tag_close|replace_open|replace_close|
  16. Transfer
    • TransferBlock
      • Locate <tag_open> and </tag_open> located at tag_level_from.
        • If before is populated then determine if it precedes <tag_open> one level before.
        • Do not make any changes if it does not.
        • Extract the data between <tag_open> and </tag_open>,
          hide <tag_open>data</tag_open>,
          move down till Tag Level == tag_level_to
          then start a new block using the passed parameters
          making sure to include the extracted data.
        • Note : tag_open does not have to have a leading '<'.
      • Syntax: TransferBlock|tag_level_from|tag_level_to|before|tag_open|replace_open|replace_close|
  17. Transform
    • TransformLFs
      • Change Linefeed (0x0A) and Carriage Return (0x0D) ASCII values to "^LF^" and "^CR^"
      • Syntax: TransformLFs
Alteryx Designer Complex workflows, analytics $5,195/user/year start
Hidden Fees: Most users pay:
  • $10,000 to $20,000/year
2-4 weeks Minimal General
In Depth
Alteryx Designer Cloud
Was Trifacta
Visual data prep, AI suggestions $10,000+/year 2-4 weeks No General
Apache Spark Big data processing Free (open source) 3-6 weeks Yes General
Dataiku Enterprise ML workflows $50,000+/year
Starts ~$48,000/year
  • Enterprise plans well into six figures
4-8 weeks Minimal General
In Depth
Datameer Cloud data platforms $25,000+/year 2-4 weeks No General
Informatica Data Quality Enterprise, compliance $200,000+/year
Small implementations (2-5 users):
  • $80,000-$150,000/year
Mid-size deployments (10-20 users):
  • $200,000-$500,000/year
Enterprise licenses (50+ users):
  • $750,000-$2,000,000+/year
3-6 months Minimal General
In Depth
KNIME Data science workflows KNIME Analytics Platform:
  • Free (open source)
KNIME Business Hub
  • ~$1,188 to 3,588/year
KNIME Server
  • Custom quote
2-3 weeks Minimal General
In Depth
Mammoth Analytics Business analysts, no-code wrangling $16/month 1-3 days No General
Microsoft Power Query Excel/Power BI users Included with Office 1 week Minimal General
OpenRefine Small datasets, budget-conscious Free (open source) Same day No General
Python pandas Data scientists, programmers Free (open source)
Use of the API is complex and highly granular where Python code must be developed to utilize several subpackages.
1-2 weeks Yes General
API
R tidyverse Statisticians, researchers Free (open source) 1-2 weeks Yes General
Packages
SQL (various platforms) Database-heavy workflows Varies Varies Yes General
Tableau Prep Tableau users, visual flows $900/user/year
Hidden Fees: Real Cost:
5-user team pays $54,000/yr
  • $504/user/year Tableau Explorer
  • $180/user/year Tableau Viewer
A mid-sized analytics team pays:
  • 5 Creators: $54,000/year
  • 10 Explorers: $5,040/year
  • 25 Viewers: $4,500/year
Total: $63,540/year
1-2 weeks No In Depth
Talend Data Fabric Mid-market, integration needs $50,000-200,000+/year
Open Studio:
  • $0
Cloud Starter:
  • $12,000-30,000
Cloud Premium:
  • 50,000-100,000
Data Fabric Enterprise:
  • $150,000-500,000+
Professional services:
  • $50,000-200,000 for complex implementations
Training and certification:
  • $5,000-15,000 per developer
Infrastructure costs:
  • Cloud computing resources for data processing
Maintenance overhead:
  • Dedicated ETL developers and administrators
Integration complexity:
  • Custom connector development for unique systems
4-6 weeks Minimal General
In Depth

Intensity has been used in various evolutionary stages to automate transformation of:

Reach out for further information

Richard Evers, CEO/Founder, revers@midnightblue.ca
Waterloo, ON, Canada