Data Quality Glossary: the terms you need to know to master your subject

Table of contents

In this glossary, Data Enso looks at terms to master to get the Data Quality ball rolling this autumn!

Address Sanitizing

Visit postal checksometimes called Address SanitizingThis involves cleaning and standardizing contact databases to ensure that all the postal addresses they contain are reliable and comply with current standards. Sending a parcel is both more costly and more time-consuming than emailing, so teams need to ensure that postal addresses are correct, up to date and in the right format. On average, undelivered envelopes (UDEs) account for between 7 and 10% of all parcels sent, with an average cost of around €1.80 per UDE. Every gift, personalized invitation, discount voucher or undelivered product has an impact on brand image, sales and customer satisfaction.

Adress Scrubbing

Address cleaning, or Address Scrubbingis a process that automatically corrects postal address entries in databases, either to bring them into the expected format or to enrich them from internal or external sources. Errors and omissions in postal addresses are more costly than inaccuracies in e-mail addresses, for obvious reasons. As a general rule, companies with regular mailings integrate an address cleansing solution in back-end mode to identify and correct errors before addresses are integrated into a dispatch or invoicing process, for example.

API

The application programming interface, better known by its acronym "API" (for Application Programming Interface), is a software interface that acts as an intermediary between two software programs or services, enabling them to exchange data and functionalities. In Data Cleaning, solutions based on API technology enable the user to access all services via pre-formatted requests without touching the code. In some cases, the Data Cleaning API aggregates several services from different providers to offer the user a unified, comprehensive and easy-to-use version.

Back-end address verification

Back-end address verification (or back-end postal verification) refers to the process of automatically correcting large volumes of postal data in real time, before any action is taken to prevent billing or shipping errors. Back-end address verification solutions correct data entry errors, flag incorrect addresses that cannot be corrected automatically, and standardize postal addresses according to local or international regulations. Objective Limiting Undelivered Parcels (UDC), which represent between 7 and 10% of all parcels sent, with an average cost of around €1.80 per UDC.

Batch Processing

Batch processing refers to the processing of data in bulk or batches, as opposed to unitary or "record-by-record" processing. Batch processing is particularly popular with companies that collect massive amounts of data, the input of which they have no control over, notably in the context of LeadGen campaigns or prospecting databases purchased from specialized service providers. The Data Cleaning by batch is an automatic (or semi-automatic) process that is generally used in corrective mode. The advantages are considerable (time savings, lower production costs, error prevention) and the applications numerous (monthly invoicing, bank data, image library, prospecting, etc.).

Big Data

Big Data, sometimes translated into megadata or massive dataMegadata refers to a very large volume of data requiring parallel processing by several machines. Because of its volume, velocity, variety and diversity of sources, megadata poses a major challenge across the entire value chain, from capture to visualization, storage, retrieval, sharing and analysis. Companies that succeed in making this megadata intelligible develop in-depth knowledge of their market and refine their segmentation, targeting, advertising efforts and offerings.

Contact Database

The contact database, sometimes referred to as the relational database, is a collection of contact details relating to a company's current or historical customers and prospects. At the heart of direct marketing, sales prospecting and Customer Success Management efforts, the contact database is an important asset that needs to be protected and enhanced. Because they are generally handled by several agents and/or computer software programs, contact databases may contain duplicates, inaccuracies, incomplete fields, format problems and so on. Obsolescence, estimated by HubSpot at 22.5% per year, is also a major problem for relational databases. Here again, the implementation of a back-end solution will guarantee the reliability of the database, so that sales and marketing processes can be launched with complete peace of mind.

Customer Centric

Customer-centricity, sometimes translated into customer centricityis a holistic paradigm that places the customer at the center of the company's activity, delivering a positive, personalized and satisfying experience with the aim of building loyalty and/or turning the customer into a brand ambassador. According to a Bain & Company study, a 5% improvement in the loyalty rate can translate into an increase in revenues of between 25% and 95%. The notion of customer centricity seems to have replaced the expression "customer orientation", very common in the 2000s and 2010s. To activate the lever of customer centricity, the company needs to work on the "customer knowledge" brick, based on reliable data and advanced analytics.

Data Cleaning

Data Cleaning is the process of correcting or deleting incorrect, corrupted, badly formatted, duplicated or incomplete data in a database. In some cases, data cleaning may also involve deleting correct but unnecessary data to reduce "noise" in the decision-making phase, or simply to reduce file size. Although there are "best practices" in this field, data cleaning is generally a tailor-made process, built around the characteristics of the data set to be processed (collection method, usual errors, possibility of using a dictionary, etc.). Data cleaning (removing erroneous or incomplete data) should not be confused with data transformation (mapping a raw format to a storage or analysis format).

Data Enrichment

Data Enrichment is the process of enhancing a database from other sources, usually by supplementing entries with new information. For example, an entry containing a prospect's surname, first name and telephone number can be supplemented with his or her e-mail address, via another internal database or a third-party source such as LinkedIn, subject to compliance with current regulations. Data Enrichment generally concerns demographic data (age, position held, CSP, etc.) and geographic data.

Data Governance

Data governance, or Data Governanceencompasses the general framework and procedures put in place by the company to ensure compliance with legal obligations and internal rules concerning the collection, storage, sharing, analysis and use of business data to maximize its business value.

Data Management

Data management, or Data ManagementIT refers to all the procedures, techniques, practices and tools used to collect, validate, store, protect and process a company's data, in order to streamline decision-making and comply with current regulations. Data management requires mastery of programming languages (SQL, Python, R, Hadoop, XML, Perl...), analysis and Business Intelligence tools, Cloud platforms and possibly Machine Learning techniques.

Data Monitoring

Data Monitoring is a proactive, ongoing process of examining and monitoring a company's data assets to ensure data quality and reliability for its intended use. Data Monitoring is based on a repository detailing the quality criteria expected in terms of completeness, uniformity, accuracy and uniqueness.

Data Profiling

Data Profiling is a discipline akin to data analysis, in which the integrity of data is assessed through a complete breakdown of its statistical characteristics, such as number of errors, percentage of duplicates, minimum and maximum values, etc. Data Profiling is generally used in the context of data migration, integration and cleansing. Data profiling is generally used in the context of data migration, integration and cleansing. Data profiling also seeks to better understand the structure, content and interrelationships between data, as well as the different uses that can be made of a company's data capital.

Data Quality

Data quality, or Data Qualityis an indicator that measures the state of a data set according to objective criteria such as accuracy, consistency, reliability, uniqueness or completeness. Following digital transformation, 65% of companies are expected to complete their transition from an intuition-based model to a fully Data-Driven process by 2026 (Gartner). They will therefore need to deploy Data Operationalization Frameworks to ensure data quality and streamline decision-making, at the risk of accusing " a competitiveness gap of at least two years "explains Gartner.

Data deduplication

Data deduplication, sometimes abbreviated to Dedupeis a computer technique which consists of factoring identical data sequences to save space and prevent redundancy in subsequent actions (double sending of an email or letter to the same recipient, statistical errors, etc.). When carried out manually, data deduplication is a repetitive, time-consuming and often faulty task. The best deduplication tools support the deduplication of massive data according to customizable criteria.

Disposable email

As the name suggests, disposable email refers to e-mail messages generated on the fly for temporary or even one-off use. In the context of an Inbound Marketing strategy, it is estimated that one out of every two emails granted in exchange for a service (white paper, webinar registration, etc.) is false or disposable. Because of their ephemeral nature, disposable emails pollute databases and impact the reach and ROI of your email campaigns. Worse still: in extreme cases, a mailing list riddled with disposable emails could lead your ISP to blacklist your domain name.

Email domain name verification software

This solution ensures the validity of e-mail addresses by analyzing incorrect, non-existent or disposable domain names, with the aim of improving the deliverability of e-mail campaigns and/or ensuring smooth communication with leads and customers.

Maturity Model

The Data Maturity Model is a reference framework that enables companies to assess the degree of maturity of their data management strategy. As part of a data governance audit, the maturity model can be used to visualize and/or rate the company's data management processes. By way of example, the Data maturity model can be used to visualize elements such as the existence or otherwise of a centralized repository, the relevance of governance rules, collaboration between different entities, the company's ability to generate collective intelligence around its Data capital, and so on.

Data merge/purge

The Merge or data merge/purge function is a common feature of Data Management solutions, which involves merging records from different sources and eliminating any redundancies that may result from this operation. The Merge/purge function can be used to create new records by combining the information contained in each "original" record. For example, two records may share the same name and address, but the first may contain the contact's telephone number and the second the e-mail address. The Merge/purge function will generate a record that integrates all this data.

Data obsolescence

Data obsolescence refers to the phenomenon whereby data loses its reliability and therefore its use value over time, depending on how recently it has been collected or updated. In a sales and marketing context, the speed of data obsolescence has greatly accelerated in recent years for demographic and cultural reasons, particularly with the increased turnover affecting a wide range of positions and business sectors. HubSpot estimates that relational databases "degrade" by 22.5% every year.

Phone Verification

Telephone verification, or Phone Verificationis a technique that consists of subjecting telephone numbers to a series of algorithms to verify their existence, validate or standardize their format in accordance with local and/or international rules, and generate additional information before entering them in a database. Telephone verification services make databases more reliable at source, thereby reducing unreachability rates, enriching prospecting arguments and optimizing the time of sales and customer service agents, who no longer have to do Data Cleaning. ad hoc.

Data segmentation

Data segmentation is the process of dividing up a set of data and grouping them into segments according to pre-selected parameters, to make better use of them in marketing, sales, HR, etc. Firmographic data, for example, can be segmented according to sales, headcount, business sector, location, etc. Data segmentation enables you to personalize your marketing message at scale, to gain a deeper understanding of the market, and to facilitate data analysis to identify new opportunities.

Web-based e-mail address validation service

This is a tool used to rapidly validate (on-the-fly) e-mail addresses at the point of entry, before they are entered into a database, thanks to a series of algorithms that assess their compliance with addressing standards and personalized criteria.

Single Customer View

The single customer view, or Single Customer Viewis a consolidated, coherent and holistic representation of all the data held by a company for each of its customers. A true performance catalyst, the Single Customer View cannot be generated without a relevant Data Quality Management policy and perfect alignment between the company's various departments. According to an Experian study, 68% of companies want to set up a single customer view to better understand target expectations and feed the decision-making sphere for performance.
Stay informed
Receive monthly data tips to improve your e-commerce, CRM, BtoB, marketing and sales performance...
Please enable JavaScript in your browser to complete this form.