The Importance of Data Quality in Modern Lead Generation

White Paper

A few years ago, only one thing counted in the contact data business: large amounts of data at the lowest possible price. However, as the amount of false data increases, the process of data validation is constantly becoming more important. This whitepaper discusses the reasons for data validation as well as strategies to improve the quality of your data. Download now.

Get the download

Below is an excerpt of "The Importance of Data Quality in Modern Lead Generation". To get your free download, and unlimited access to the whole of bizibl.com, simply log in or join free.

download

The telephone numbers, email and postal addresses of potential customers are among the data sets most in demand in a growing international business. A few years ago, only one thing counted in the contact data business: large amounts of data at the lowest possible price.

New legislation, a change in consumer behaviour and the exploitation of the direct marketing sector itself – particularly in the UK – have brought about a lasting change in the lead generation business.

As the amount of false data sets increases and the tricks of the data counterfeiters become more sophisticated, in turn validation processes are becoming more elaborate.

Lead generators, affiliates and publishers suffer from black sheep, which are often hard to identify. There is an ongoing game of cat and mouse in the online world!

An iPad, a brand new Audi A4 or a highpriced shopping voucher in exchange for receiving a few more advertising emails in your inbox – this, putting it in simple terms, is what online lead generation is all about.

What sounds simple is, in reality, becoming more complex. Even just a few years ago, the contact data of potential customers could (more or less) be generated, used, exchanged and sold with no restrictions.

New laws, changed consumer habits and the historical misuse of consumer data have led to an increased demand for valid and carefully targeted data which has to be as accurate as possible.

These factors will bring about lasting change to the lead business in the next few years. Dealing with the increasing amount of false and falsified data will thus become a decisive factor in successful lead generation.

False & Falsified Data

Despite increasing numbers of false and falsified entries, the proportion of genuine entries in most countries is still very high. According to eGentic research, approximately 60 % of the leads generated in the UK meet quality requirements (accurate postal and email addresses, active telephone numbers etc.) with Germany and especially Scandinavian countries coming in significantly higher at over 70% and the Mediterranean region scoring slightly lower.

The majority of consciously falsified data stems from several providers and publishers”

Of the 30-40% false data, almost half are the result of unintentional errors. The number one source of error is mistakes made in data entry; for example, numbers entered the wrong way round or spelling/ typing mistakes. Within Europe, there appear to be distinct cultural differences: in general, Northern Europeans enter their data more carefully and correctly, whereas in Southern Europe, data is entered more casually and quickly resulting in more errors.

The remaining 50% of false data is intentionally manipulated. This originates, to a small degree, from consumers who consciously falsify their data to avoid being contacted.

The majority of consciously falsified data stems, however, from several providers and/or publishers of the large affiliate networks who are responsible for the publication and distribution of campaigns for lead generation. These providers are mostly remunerated for each lead generated and more leads automatically means more profit.

The temptation to include false data is therefore very high. Filtering out the sparse numbers of unserious partners from the large quantity of partners needed today for successful lead generation is nothing short of looking for a needle in a haystack.

Lead generation for direct marketing activity exists in an area of marketing, in which manipulations are, unfortunately, relatively easy to make. It is easy to prepare a data set, which either appears to be real or actually is real, without having permission.

The worst case scenario occurs when real consumer data is marketed to without the individual’s permission. Fortunately, this happens rarely to serious providers particularly due to conventional “double opt-in” procedures. More frequently, falsified data (data which appears to be genuine, but for which no real person exists) is used in campaigns and is difficult, but not impossible to identify.

Modern Filters and Forensic Instinct

In the lead generation process, there are numerous technical means to check the validity of data. An almost forensic instinct is needed; making human intervention an essential addition to detection methods.

Today, up to 60 different filters are applied to the real-time, lead generation process. For example, in addition to purely technical filtering against “mass subscriptions”( i.e. mass entries which originate from only one IP address or one IP address block) filtering against so-called “bad word” lists, ranks amongst the best known techniques in detecting false data. The term “bad word” does not refer to the characteristics of words or names, but rather to the probability that the data set is wrong or false. Names such as “Mickey Mouse”, “Sponge Bob” and “John Doe” appear by the dozen, as well as historical figures such as “Julius Caesar” and “Albert Einstein”, or public figures like “Kate Middleton”, “David Beckham” or “David Cameron”. Filtering out these datasets is simple; the real challenge is establishing how up-to-date the lists are - because current events or even the names of popular reality show participants quickly find their way into the data generation processes!

Want more like this?

Want more like this?

Insight delivered to your inbox

Keep up to date with our free email. Hand picked whitepapers and posts from our blog, as well as exclusive videos and webinar invitations keep our Users one step ahead.

By clicking 'SIGN UP', you agree to our Terms of Use and Privacy Policy

side image splash

By clicking 'SIGN UP', you agree to our Terms of Use and Privacy Policy

So-called “short addresses” are also filtered out, i.e. persons or place names with only two or three letters. Complex systems even check the relationship of the number of consonants to vowels meaning that, depending on the language, conspicuous features can be quickly and automatically recognised. The datasets sorted out in this manner are not simply deleted, but rather, are subjected to a further test. The Croatian island “Krk” for example, contains no vowel and only consists of three letters but is nevertheless, a real place. These exceptions are entered into so-called “White Lists” and the data validated in this manner is fed back into the data pool. Thus, the manual component of the entire process is actually the decisive factor - even with the whitelisting.

The recognition and/or tracing of certain patterns, on the other hand, is considerably more complex than just filtering according to lists or IP addresses. Leads from one source (Traffic-Source) exhibit certain anomalies, for example, consistent patterns in the use of numbers, capital letters, dots, birthdates, etc. Modern filter systems recognise even more complex patterns. Manual checking is important here as well. What the brain can achieve cannot easily be translated into automated algorithms. Trained and experienced native speaker employees, on the other hand, can recognise whether a data series is valid or false with a very high degree of success.

Let’s give a practical example: One letter, dot, five letters (e.g. s.smith) – numerous email addresses look just like this. However, when emails always arrive at the same time intervals, and subsequently have two letters, dot, six letters, then three letters, dot, five letters, etc., and were all generated by the same publisher, then it pays to start looking more closely. As the methods of data counterfeiters become more elaborate, the means of detection need to improve to combat them. In particular, combining technical methods with human intervention techniques produce an optimal mix – with modern and quick pinging processes helping just as much as real-time validation processes.

The Road to Quality

The number of internet users continues to rise rapidly, whilst the proportion of internet access via mobile devices increases faster still. The potential to acquire new customers via online lead generation is rising at the same rate – but there are risks attached.

The UK and the USA typify those markets in which lead generation is highly developed. For the most part, it is precisely these markets that have been ruined by the extremely high levels of poor data - it will still take years to create a turnaround in these markets. The attitude still exists that the fewer generated datasets that convert, the more data will be required. Where lead generators did not historically invest in quality and sustainability in an active manner, it will be a difficult process to restore confidence.

quality and sustainability will grow as providers and companies learn that this investment pays off for both sides

Fortunately, there is a visible trend towards the demand for better quality data, simply to minimise loss due to scrub and to improve conversion rates. In particular, companies who want to be active in a sector for the long-term - and in a sustainable manner, see the need to invest.

This also applies to the providers and affiliate networks in the lead business who are also aiming for growth and sustainable business.

For this reason, in both the UK and Europe, for example, traffic providers are much more interested in isolating individual, dubious publishers than in other countries and territories. Also, more providers are placing greater emphasis on contractually defined restrictions on how frequently the data may be contacted or utilised (particularly a contact from a telesales person or from a call centre).

The trend towards quality and sustainability in the lead generation process will grow as providers and companies learn that this investment pays off for both parties.

A re-thinking of the situation started long ago in the major markets. The optimal mix of external and internal filters, algorithms and comparison lists along with human intervention will, in either the short or longterm, be the decisive success factor for all providers.

Want more like this?

Want more like this?

Insight delivered to your inbox

Keep up to date with our free email. Hand picked whitepapers and posts from our blog, as well as exclusive videos and webinar invitations keep our Users one step ahead.

By clicking 'SIGN UP', you agree to our Terms of Use and Privacy Policy

side image splash

By clicking 'SIGN UP', you agree to our Terms of Use and Privacy Policy