Garbage In, Garbage Out (GIGO)

Table of contents

In the context of information technology, " Garbage In, Garbage Out " (GIGO) is an expression which illustrates that the quality and relevance of the results of an analytical program depend essentially on the quality of the input data, even if logical precision is achieved. More broadly, the expression applies to any decision-making system that relies on upstream data. 

A brief history of "Garbage In, Garbage Out"...

The expression Garbage In, Garbage Out (GIGO) first appeared in a 1957 Times newspaper article documenting the military applications of mathematics in the USA during the Second World War. " Computers can't think for themselves: badly programmed inputs lead to faulty outputs ", explained William Mellin, a specialist in military innovation.

This idea ties in with the testimony of Charles Babbage (19th century), who designed the very first programmable calculating device: " On two occasions, I was asked: 'Tell us Mr. Babbage, if you put wrong numbers in the machine, can it give right answers?'... I am unable to understand the kind of confusion of ideas that can lead to such a question ". In other words: stupid question, stupid answer!

In 1969, a 30-minute British documentary looked at how workers in a large industrial company perceived the computer equipment that had just been installed in their factory. Despite the "wow" factor, the interviewees were already aware that the machine can't work miracles if it isn't properly powered.

The GIGO concept, with its simple, playful approach, has migrated to other disciplines. The quality of a gourmet dish depends on the quality of the ingredients used, and the reliability of a clinical trial largely depends on the accuracy of the data collected by researchers... just like the decisions made in business!

GIGO: the importance of upstream in a Data-Driven world

The issue of input data quality is gaining momentum as Data becomes more and more democratized in the enterprise. As Gartner explains, 65% of companies are expected to complete their transition from a model based on flair and intuition to a fully Data-Driven process by 2026

And the stakes are high, as structures that fail to demonstrate an effective Data Quality framework by 2024 will be at least 2 years behind the competition.

In e-commerce, competitive intensity is such that the "customer knowledge" brick is becoming one of the only competitive advantages. " Major brands like Procter & Gamble and Unilever are investing in sophisticated analytics to dominate e-commerce," explains Mike Black, CMO of Profitero.

With the exponential growth in the amount of data generated and the weight of Data in the decision-making process, the question of Data Quality becomes crucial.

Data Quality to prevent GIGO in the decision-making sphere

A data set can cause a GIGO due to intrinsic errors (erroneous data), but not only. Indeed, data that is accurate but not applicable to the specific context can lead to biased decisions.

Example: a company publishes software exclusively for .Net developers. An error occurs in the LeadGen process, and the company's CRM is populated with a database of Java developers. The CRM is asked to identify the leads most likely to go on to purchase, based on various criteria: the company's financial capacity, the lead's decision-making power in the purchase decision, its need and its urgency ("BANT" model).

In this example, the CRM will perform lead management on an off-target list. The output will be leads with good financial capacity, decision-making authority and an urgent need... but not to buy this particular product. The reasoning is logical, but the assumption is wrong. This is a typical example of a qualitative GIGO.

In short, it's pointless to devote time and resources to streamlining the decision-making process if the input data doesn't undergo rigorous validation upstream. In an era of exploding data volumes, the company will have to make trade-offs to identify critical data that cannot "bear" error.

Drowned in the mass of data collected, approximations on the number of visits to an e-commerce site are unlikely to lead to dramatic decisions. On the other hand, an error in the order-taking process (delivery address, telephone number, etc.) can cost the company a sale and a customer:

  • A 1% error in the 300,000 clicks of a website's daily visitors represents 3,000 clicks. That's statistically insignificant.
  • An error of 1% on 1,000 orders represents 10 erroneous orders, which will lead to 10 delivery problems, with a dead loss and unhappy customers.

Flawless data for rational decisions

Data Quality refers to the set of tools, processes and techniques that measure the accuracy and usefulness of a data set according to predefined rules:

  • The accuracy of the data collected
  • Uniqueness (no redundant data)
  • Completeness: are all the data required to make a decision available?
  • Their usefulness in relation to the expected decision
  • Their degree of reliability
  • Their topicality (or freshness).
 
Sandrine Le Cam

To find out more...

To support companies in their Data - Driven transformation, Data Enso has developed simple, 100% RGPD-compliant solutions. 

Objective :

  • Clean and correct existing data with batch solutions
  • Ensure the reliability and veracity of the data collected (input assistance, automatic real-time correction, verification of emails and telephone numbers).
  • Enhancing databases
  • Optimize collection systems.
 

Discover our solutions and turn your Data capital into a real performance driver