Messy data is one of the most frustrating things to deal with in Excel. Before you can analyse anything, you often have to spend hours fixing inconsistent formatting, removing duplicates, splitting columns that should have been separate, and working out what to do with blank cells. AI tools — particularly ChatGPT — can significantly speed this up, especially for the parts where you don’t know which formula to use. This guide covers both the built-in Excel tools and how AI can help with the tasks they can’t handle.
What Messy Data Looks Like
Data cleaning problems fall into a few common categories. Knowing what you’re dealing with helps you choose the right tool for each one.
- Inconsistent capitalisation — “smith”, “Smith”, “SMITH” all appearing in the same column
- Extra spaces — leading spaces before a name, trailing spaces at the end, or double spaces in the middle
- Mixed date formats — some cells say “01/04/2024”, others say “April 1, 2024” or “1-Apr-24”
- Duplicate entries — the same customer or product appearing multiple times
- Missing values — blank cells where there should be data
- Merged cells — common in downloaded reports, they break sorting and formulas
- Data that should be in separate columns — full names in one column instead of first/last name, addresses all in one field
- Numbers stored as text — cells containing “1,234” as text rather than the number 1234
Step 1: Use Excel’s Own Cleaning Tools First
Before reaching for AI, Excel has built-in functions that handle the most common problems quickly.
TRIM — remove extra spaces
TRIM removes leading spaces, trailing spaces, and reduces multiple spaces between words to a single space. It’s the first thing to apply to any imported text data.
=TRIM(A2)
Apply this in a helper column, then copy and paste as values back over the original.
CLEAN — remove non-printing characters
Data from external systems often contains hidden characters that aren’t visible but cause problems. CLEAN strips these out.
=CLEAN(A2)
For best results, combine both: =TRIM(CLEAN(A2))
PROPER, UPPER, LOWER — fix capitalisation
=PROPER(A2)— converts to Title Case (first letter of each word capitalised)=UPPER(A2)— converts everything to UPPERCASE=LOWER(A2)— converts everything to lowercase
PROPER is the most commonly useful for names and addresses.
Text to Columns — split combined data
If a column contains data that should be in two or more columns — such as “John Smith” needing to become separate first and last name columns — use Text to Columns:
- Select the column.
- Go to Data > Text to Columns.
- Choose Delimited (if data is split by a space, comma, etc.) or Fixed Width.
- Set the delimiter and click Finish.
Remove Duplicates
- Click anywhere in your data.
- Go to Data > Remove Duplicates.
- Choose which columns to check for duplicates — you can check all columns (exact row match) or specific columns (e.g., just the email address).
- Click OK. Excel will report how many duplicates were removed.
Step 2: Use Flash Fill for Pattern-Based Cleaning
Flash Fill is Excel’s pattern recognition tool. Start typing the cleaned version of a value in the next column, and Excel will detect the pattern and fill in the rest automatically.
For example, if column A has “SMITH, JOHN” and you want column B to show “John Smith”, type “John Smith” next to the first row, then press Ctrl+E. Excel will apply the same transformation to the rest of the column.
Flash Fill works well for reformatting phone numbers, extracting parts of text strings, reversing name order, and simple pattern transformations. It doesn’t always get complex patterns right on the first try — if it goes wrong, press Ctrl+Z and try giving it two or three examples first.
Step 3: Use ChatGPT to Write Cleaning Formulas You Can’t Figure Out Yourself
When the built-in tools aren’t enough, ChatGPT is the fastest way to get the formula you need. Describe your data and the problem, and ChatGPT will write the formula with an explanation.
Example prompts for data cleaning
“Column A contains UK phone numbers in multiple formats: some start with 07, some with +447, some have spaces, some have dashes. Write an Excel formula that standardises all of them to the format 07XXX XXXXXX.”
“Column A has dates in mixed formats — some say 01/04/2024, others say April 1 2024, and some say 1-Apr-24. How do I standardise these in Excel?”
“I have a column of company names with inconsistent formatting — some have Ltd, some have Limited, some have Ltd., some have nothing. Write a formula that standardises all variations to just ‘Ltd’ at the end.”
“Column A has email addresses. Write a formula to check whether each one is in a valid email format (contains @ and a dot after it) and return TRUE or FALSE.”
Step 4: Ask ChatGPT for a Cleaning Plan for Your Specific Dataset
If you have a particularly messy file and don’t know where to start, describe it to ChatGPT and ask for a step-by-step cleaning plan. This is especially useful when the data has come from an unusual source or has multiple overlapping problems.
Example prompt: “I’ve exported a customer list from our old CRM. It has 8 columns: customer ID, company name, first name, last name, email, phone, address (all in one field), and account status. The data is messy — company names have inconsistent capitalisation and spacing, phone numbers are in different formats, addresses need splitting into street/town/postcode, and some rows have blank email fields. Give me a step-by-step plan for cleaning this in Excel, using formulas and built-in tools.”
ChatGPT will return a prioritised checklist — typically covering which columns to tackle first, which tools or formulas to use for each, and how to handle the blank cells.
Step 5: Use Power Query for Large-Scale Transformation
For large datasets or cleaning tasks you need to repeat regularly, Power Query is the right tool. It lets you build a repeatable cleaning pipeline — apply the same steps to new data every time you refresh.
How to access Power Query
- Go to Data > Get Data > From File > From Workbook (or from whatever source your data comes from).
- The Power Query Editor opens — this is where you define your cleaning steps.
- Use the Transform tab to trim whitespace, change case, split columns, replace values, and more.
- Click Close and Load when done.
Every step you apply is recorded and can be re-run on fresh data with one click. This is far more efficient than manually cleaning the same types of problems every month.
Practical Example: Cleaning a List of Company Names
Suppose you have a column of company names from different sources and they look like this:
- acme limited
- ACME Ltd.
- Acme Ltd
- Acme limited
- ACME LIMITED
To standardise these to “Acme Ltd”:
- First, apply
=TRIM(PROPER(A2))to a helper column — this gives you “Acme Limited”, “Acme Ltd.”, “Acme Ltd”. - Then use nested SUBSTITUTE to standardise the endings:
=SUBSTITUTE(SUBSTITUTE(TRIM(PROPER(A2)),"Limited","Ltd"),"Ltd.","Ltd")
This removes trailing full stops from “Ltd.” and replaces “Limited” with “Ltd” — leaving you with a consistent “Acme Ltd” every time.
If the pattern is more complex than this, that’s exactly the kind of problem to paste into ChatGPT: describe the variations you’re seeing and ask for a formula that handles all of them.
Summary: Recommended Cleaning Workflow
- Run TRIM and CLEAN on all text columns.
- Standardise capitalisation with PROPER/UPPER/LOWER.
- Use Text to Columns to split combined fields.
- Remove Duplicates.
- Use Flash Fill for pattern-based reformatting.
- Use ChatGPT to write formulas for problems the built-in tools can’t handle.
- Use Power Query if you need to repeat the process on new data regularly.
Data cleaning is rarely exciting, but doing it properly before analysis saves far more time than it costs. The combination of Excel’s built-in tools and ChatGPT for the tricky bits handles the vast majority of real-world messy data problems.