Search
Close this search box.
  • FR

Data Quality Glossary: the terms you need to know to master your subject

Table des matières

Data Quality Glossary: the terms you need to know to master your subject!

Address Sanitizing

Postal verification, also known as Address Sanitizing, involves cleaning and standardizing contact databases to make sure that all the postal addresses are reliable and compliant with current standards. Sending a parcel is more expensive and time-consuming than emailing, which requires teams to check that postal addresses are correct, up to date and in the right format.

On average, undelivered mail (UDL) makes up 7-10% of all packages sent, with an average cost of about €1.80 per UDL. Every gift, customized invitation, coupon or undelivered product will impact brand image, revenue and customer satisfaction.

Adress Scrubbing

Address Scrubbing is a process that automatically corrects postal address entries in databases, either to the expected format or to add information from internal or external sources. Postal address errors and omissions are more costly than email address inaccuracies for obvious reasons.
Companies that do regular mailings typically have an address cleansing solution on the back end to identify and correct errors before addresses are integrated into a shipping or billing process, for example.

API

An application programming interface (API) is a software interface that acts as an intermediary between two software programs or services to exchange data and functionality.

Data cleaning solutions using API technology allow the user to access all services through pre-formatted requests without affecting the code.

In some cases, the Data Cleaning API aggregates multiple services from different providers to provide the user with a unified, comprehensive and easy-to-use version.

Back-end adress verification

Back-end address verification is the process of automatically correcting large volumes of postal data in real time before any action is taken to prevent billing or shipping errors.

Back-end address verification solutions correct data entry errors, flag incorrect addresses that cannot be corrected automatically, and standardize postal addresses according to local or international regulations.
The goalis to reduce the number of undelivered parcels (UDPs), which account for 7-10% of all parcels sent, with an average cost of approximately €1.80 per UDP.

Batch Processing

Batch processing refers to the processing of data in bulk or batches, as opposed to unitary or record-by-record processing. Batch processing is very popular with companies that collect massive amounts of data that they do not control, such as in the context of LeadGen campaigns or prospecting databases purchased from specialized providers.

Batch data cleaning is an automatic (or semi-automatic) process that is generally used in a curative mode. It has interesting advantages (time saving, cost reduction, error prevention) and many applications (monthly invoicing, bank data, image library, prospecting, etc.).

Big Data

sometimes called Mega Data or massive data, refers to a very large amount of data which requires parallel processing by several machines. Big Data’s volume, velocity, variety and diversity of sources make it a major challenge across the entire value chain, from capture to visualization, storage, search, sharing and analysis.

Companies that successfully manage to make this megadata intelligible develop deep market insights and refine their segmentation, targeting, advertising efforts and offerings.

Contact Database

Contact databases, sometimes called relational databases, are a collection of contact information about a company’s current or past customers and prospects.

They are central to direct marketing, sales prospecting and customer success management efforts, and are an important asset to protect and enhance. They are usually manipulated by several agents and/or computer software, so contact databases can have duplicates, inaccuracies, incomplete fields, format problems, etc.

Another major problem with relational databases is obsolescence, which is estimated at 22.5% per year by HubSpot.

Once again, implementing a back-end solution will guarantee the reliability of the database to trigger sales and marketing processes.

Customer Centric

Customer Centricity is a global approach that places the customer at the center of the company’s activity to provide a positive, customized and satisfying experience. The goal is to build customer loyalty and/or make them a brand ambassador. According to a Bain & Company study, a 5% improvement in customer loyalty can result in a 25% to 95% increase in revenue.

Customer centricity seems to have replaced the term “customer orientation”, which was very common in the 2000s and 2010s. If a company wants to activate the lever of customer centricity, it must work on the “customer knowledge” brick, based on reliable data and advanced analytics.

Data Cleaning

Data Cleaning is the process of correcting or removing incorrect, corrupted, poorly formatted, duplicated or incomplete data from a database. In some cases, data cleaning may also involve deleting correct but unnecessary data to reduce “noise” in the decision-making process or simply to reduce the size of files.

Bien qu’il existe des « bonnes pratiques » en la matière, le Data Cleaning est généralement un processus sur mesure, construit en fonction des caractéristiques de l’ensemble de données à traiter (méthode de collecte, erreurs usuelles, possibilité de se baser sur un dictionnaire, etc.).

There are “best practices” in this area, but data cleaning is usually a customized process, built according to the characteristics of the data set to process (collection method, common errors, option to use a dictionary, etc.). Data cleaning (removal of erroneous or incomplete data) should not be confused with data transformation (mapping of a raw format to a storage or analysis format).

Data Enhancement

Data Enrichment is the process of enhancing a database from other sources, usually by adding new information to the entries.

For example, an entry with a prospect’s first and last name and phone number can be enhanced with his or her email address from another internal database or a third-party source such as LinkedIn, subject to compliance with applicable regulations.

Data Enrichment generally concerns demographic data (age, job title, social class, etc.) and geographic data.

Data Governance

Data Governance refers to the overall framework and procedures that an organization uses to comply with legal requirements and internal policies for collecting, storing, sharing, analyzing and using business data to maximize business value

Data Management

Data Management refers to all the procedures, techniques, practices and tools used to collect, validate, store, protect and process corporate data to streamline decision making and comply with applicable regulations.

Data management requires mastery of programming languages (SQL, Python, R, Hadoop, XML, Perl…), analysis and Business Intelligence tools, Cloud platforms and eventually Machine Learning techniques.

Data Monitoring

Data Monitoring is a proactive and continuous process that involves examining and monitoring the company’s data assets to make sure that the data is of high quality and reliable for its intended use. Data Monitoring relies on a framework that outlines the expected quality criteria of completeness, consistency, accuracy and uniqueness.

Data Profiling

Data Profiling involves assessing data integrity through a complete statistical analysis of data characteristics such as the number of errors, duplicate percentages, minimum and maximum values, etc. Data Profiling is typically used for data migration, integration and cleansing.

Data profiling also aims to better understand the structure, content, and interrelationships between data as well as the different uses that can be made of the company’s data assets.

Data Quality

Data Quality is an indicator of the state of a data set according to objective criteria such as accuracy, consistency, reliability, uniqueness or completeness.

After the digital transformation, 65% of companies are expected to complete their transition from an intuition-based model to a fully Data-Driven process by 2026 (Gartner). They will need to develop data operationalization frameworks to improve data quality and streamline decision making, or else they risk “falling two years behind in competitiveness,” , says Gartner.

Data deduplication

Data deduplication, sometimes called Dedupe, is a data processing technique that involves factoring identical data sequences to save space and prevent redundancy in subsequent actions (sending an email or letter twice to the same recipient, statistical errors, etc.).

When it is done manually, data deduplication is a repetitive, time-consuming and inefficient task. The best deduplication tools can handle massive data deduplication according to customizable criteria.

Disposable Email

As the name implies, a disposable email refers to an email message that’s created in real-time for temporary or even one-time use. One out of two emails given in exchange for a service as part of an inbound marketing strategy (white paper, webinar registration…) is estimated to be fake or disposable

They are ephemeral and pollute databases, impacting the reach and ROI of your email campaigns. Worse: in extreme cases, a mailing list full of disposable emails can lead your ISP to blacklist your domain name.

Email domain name verification software

This solution checks the validity of email addresses by identifying incorrect, non-existent or disposable domain names in order to improve the deliverability of email campaigns and/or to ensure smooth communication with leads and customers.

Maturity Model

The Data Maturity Model is a framework that allows companies to assess the maturity of their data management strategy. During a data governance audit, the Maturity Model will help visualize and/or score the company’s data management processes.

For example, a data maturity model can be used to visualize elements such as the existence or not of a centralized data repository, the relevance of governance rules, the collaboration between different entities, the capacity of the company to generate collective intelligence based on its data capital, etc.

Data Merge/Purge

Data Merge/Purge is a common function in data management solutions. It involves merging entries from different sources and removing redundancies that may result from this operation. The Merge/Purge function can be used to create new records by combining the information contained in each “original” record

For example, two entries may share the same name and address, but the first may contain the phone number and the second the contact’s email address. The Merge/purge function will generate a record that includes all this data.

Data obsolescence

Data obsolescence refers to the process through which data loses its reliability and usage value over time, depending on how recently it was collected or updated.

In a sales and marketing context, the speed of data obsolescence has accelerated in recent years for demographic and cultural reasons, particularly with the high turnover affecting many positions and industries. HubSpot expects relational databases to “degrade” by 22.5 percent each year.

Phone Verification

Phone Verification, is a technique that involves running phone numbers through a series of algorithms to check their existence, validate or normalize their format according to local and/or international rules, and generate additional information before they are entered into a database.

Phone verification services make databases more reliable at the source to reduce unreachability rates, improve the prospecting pitch and optimize the productivity of sales and customer service agents who no longer have to do data cleaning activities

Data segmentation

Data segmentation refers to the process of dividing a set of data and grouping them into segments according to pre-selected parameters so they can be better used for marketing, sales, HR, etc. For example, company data can be segmented by revenue, headcount, industry, location, etc.

Data segmentation allows for customized marketing at scale, deeper understanding of the market, and easier data analysis to identify new opportunities.

Web-based email validation service

This tool is used to quickly validate (on the fly) email addresses before they are entered into a database, using a series of algorithms to assess compliance with addressing standards and custom criteria.

Single Customer View

The Single Customer View is a consolidated, coherent and holistic representation of all the customer data a company has. The Single Customer View is a true performance catalyst, which can only be achieved with a relevant Data Quality Management policy and a perfect alignment between the different departments of the company.

According to an Experian study, 68% of companies want to implement a Single Customer View to better understand the target audience’s expectations and to feed the decision-making process.