Cleaning Up Data
A common spreadsheet task is to clean up inconsistent data. For example, you might need to ensure that all phone numbers are in a common format, email addresses are in lowercase, dates are formatted consistently, and so on.
My PUP add-in contains a few useful tools for data clean-up. Probably the most useful one is Text Tools.
Text Tools has the following options that work with the selected cells:
- Change the case of the text (UPPER CASE, lower case, Proper Case, Sentence case., and tOGGLE cASE).
- Add new characters to the text (at the beginning, at the end, or at a specified position).
- Remove characters from the text by position (from the beginning, from the end, or beginning at a specified position).
- Remove spaces from the text (all extra spaces, leading spaces, trailing spaces, or all spaces).
- Remove characters of a particular type from the text (non-printing characters, alpha characters, non-numeric characters, non-numeric and non-alpha characters, or numeric characters).
Of all the PUP features, I use Text Tools most often.
* * *
What other options for data clean-up?
Rob Adams sent me a link to an on-line tool that some might find helpful: Magic/Replace. You can upload a file (XLS, CSV, or TSV) or simply copy and paste your data. Then, using a row-and-column interface ,adjust one row -- and the other rows change automatically. When your data is cleaned up, save the changes and you'll be notified by email when the file is ready to be downloaded.
I tried it by pasting a 280-row range from and Excel file. One of the columns had city and state names in uppercase (for example, TUCSON, AZ). I wanted to make the city proper case and keep the state abbreviation upper case. I couldn't get it to work. It changed some of the data, but not all of the data. I eventually figured out that the "example row" (which I modified) had a slash character in one of the city names. When I deleted the slash, changing the case seemed to work as it should. I submitted the changes, and I was notified about one minute later that the file was ready for download. The file looked pretty good, except that some of the cells were preceded by a space character. So I fixed it with PUP in about 5 seconds.
Some people might find Magic/Replace useful, but I won't be using it again. One problem is the lack of documentation. You can watch a video demo, but that's it. Also, the changes you can make seem to be fairly simple and it assumes that your data is fairly consistent to start with. For example, if one row has a telephone number in the zip code column, you can't change it. The inability to preview all of your data -- and the lack of undo -- made it kind of a frustrating experience.
- Reader Comments -
Following are comments in response to this item.
The most recent comment is at the bottom.
- By Debra Dalgleish. Comment posted 03 December, 2008 3:10pmI tried it too, using their sample data. It did some things well, but I'll stick with PUP too.
I'm sure they're nice people but I'm not going to upload any personal or business data to a non-secured site, run by strangers. There aren't any privacy assurances that I can see. - By rob. Comment posted 04 December, 2008 11:04pmI didn't try it, but it looked interesting for simple things. I'm glad to hear that a well-written affordable commercial product performs better. BTW, I didn't mean the submission to be a dig at PUP or anything. Just thought of you when I saw it.
Spreadsheet Page Blog
Welcome to the Spreadsheet Page Blog. This is where you find the latest news on my books, add-ins, and other Excel-related topics. Comments are welcome.

