Forum Stats

  • 3,678,230 Users
  • 7,787 Discussions


How to clean your data without any Regex knowledge

Leads are one of the most valuable things for you company.

A big part of all the effort you put in your Eloqua instance is to create leads, pass them to Sales and generate value!

But.. What if you notice that a part of your leads never even reaches sales?

This can be caused by data restrictions on CRM-side.

And then you face the challenge that you must clean and transform the data in a way that your CRM system understands.

At our company, we are connected to Salesforce. We are working on becoming more agile but for now: the Salesforce team is in another team, even in another department.

In the weekly meetings they are updating us (the marketing automation team) on what’s new on Salesforce side.

One time they decided: we will not accept any weird characters any longer. And we had to find a way to cope with that.
Regular expressions were the best way to solve this. But: what if you have limited time and no regex knowledge? Here’s what we did.

The first step is: which fields do we need to clean?
And do not focus on what is needed now, but also with a future perspective in mind.
At our side, what was needed now was: no weird characters in first name and last name and first name could not be empty.

Additionally, we wanted to clean common data issues that were made in this fields.

Like propercasing first and last name. This looks better in our personalisation.

The field Gender was standardized with a list where the options Male, Female and Unspecified are available.

But due to list uploads we have a lot of other versions of this truth that come in.

So we analysed our data and checked which values were coming. Based on that we listed all the wrong values.

The exact same problem appeared with the language field. In our list uploads we receive a lot of abbreviations, the language in another language,…

The Mobile phone number field was difficult. Because people fill it in with the country code, without the country code, with spaces in between the numbers, with dots, with slashes, ..

And in our field, we just want the numbers without the country code and without the spaces slashes or dots.

Because we are in B2B, the Customer number is mandatory to send our leads to Salesforce. But what do we do if we do not have a customer number?

Step 2: the contact data washing machine app is your best friend.

Build a program that contains: data washing machines, update rules, lookup tables,..

First & last name

For all the weird characters (we had over 100 in total) we used the replace X by Y. Except for the _ we replaced it by a space. By weird characters we mean the following: 0123456789:;<=>?@[\]^_`{|}~&amp;#127;€&#129;‚ƒ„…†‡ˆ‰‹&#141;&#143;&#144;‘’“”•–—˜™›&#157;¡¢£¤¥¦§¨©ª«¬­®¯°±²³´¶·¸¹º»¼½¾¿Ð×÷!"#$%&()*+/

And then we added the action to propercase the first and last name.

Gender and language

Here we analysed all the values we had in the gender field. And we set up a washing machine with the action replace X by Y. This for all the different variations. For example: replace ms by Female. Or replace NL by Nederlands. If the field is empty, it gets filled in by the default value. For gender that’s Unspecified, for language that’s English. This is done by an update rule.

Mobile phone number

Before starting this clean-up, we looked at the integration section and examined which phone numbers the failed leads contained. Important: our mobile country code and mobile phone number are in 2 separate fields.

Based on that we found as common mistakes, we changed this:

  • Replace . by (empty)
  • Replace / by (empty)
  • If people filled out the Mobile country code, but did not started their phone number with an 0. We did compose 0{mobile phone number} so it would be valid. Because the Belgian mobile country code is 0032 or +32 and then you add the rest of your number 412345678. But when you give your phone number separately its 0412345678.
  • Then we checked if the mobile phone number contains 0032/+32. Then we would put that value in the mobile country code field. And we would replace the +32 or 0032 by an 0.

Customer number

When contacts fill out a form, they have to select their company by a drop down list that pops up after you type the first 3 letters of your company.

In the back-end we match this company name to their VAT number. And then we match the VAT number to the company number in a program.,

So our next step is to check: do we have the company number?

If no, do we have the company name and the VAT number?

If yes, then check via a Lookup table what the right company number for this contact is.
If we do not have any of the required info about the company of that lead, we link them to a dummy company with a dummy company number. Without this, the CRM system would reject the lead because it has no company number.

Step 3: check if it works

When activating this program, check the output. And keep modifying and improving your program until you get the clean data you want.

These tips and tricks can be learned in the datacleaning course. And thanks to his we can send more leads to the CRM successfully.

But also: our data is much cleaner which is great for personalisation.



  • andrepazleal
    andrepazleal Posts: 43 Red Ribbon

    So much effort to explain and not even a comment! Thanks a lot for sharing Celine!

    Let me ask you something can you share the rules used specially in the mobile phone, I have a lot of problems with masking numbers like 1-(75)99992323 or (55)8883-3433, a bunch of possibilities, I would like to remove any non digit:


    The output (what I need removing, NON digit and blanks):


    But sadly that it's not applied into the character, do you know what might be happening?

    Thanks  again!