askvity

What is the objective of the CLEAN function?

Published in Text Cleaning 2 mins read

The primary objective of the CLEAN function is to remove specific nonprinting characters from text strings. This function is particularly useful for cleaning data imported from other applications or sources, where text may contain characters that are not visible but can affect layout, formatting, or processing.

Removing Nonprinting Characters

Based on the provided reference, the CLEAN function is designed to target and eliminate characters that are not intended for display. Specifically, its objective is to:

  • Remove the first 32 nonprinting characters found in the 7-bit ASCII code. These correspond to values 0 through 31.
  • In the Unicode character set, it also removes additional nonprinting characters with specific values: 127, 129, 141, 143, 144, and 157.

These nonprinting characters can include things like line breaks, carriage returns (that aren't standard paragraph breaks), and other control characters that were historically used for formatting or controlling devices rather than displaying text.

Practical Application

By removing these particular characters, the CLEAN function helps ensure that text data is more consistent and free from hidden elements that might cause issues in spreadsheets, databases, or other text processing applications. It makes the text cleaner and more suitable for further analysis or display.

For example, text copied from a webpage or a document might contain these hidden characters. Applying CLEAN helps to remove them, resulting in a plain text string that is easier to work with.

Related Articles