G
Blog
November 9, 2021
1 min read

Where do biases in ML come from? #3 📏 Measurement

Machine Learning systems are particularly sensitive to measurement bias. Calibrate your AI / ML models to avoid that risk.

Ruler to measure
Jean-Marie John-Mathews, Ph.D.
Ruler to measure
Ruler to measure

In this post, we focus on one of the most important biases: measurement 📏

Data is the result of measurements that are done either by a human or a machine. Noise is inherent to every measurement. Usually, it is possible to get rid of the measurement noise by using aggregated measurement points.

Unfortunately, this technique does not really work in real ML projects. Noise is not random with respect to the event we want to predict. Put differently, measurement biases happen when the measurement noise is correlated with the target variable.

Here are some examples:

❌ In image recognition, the training data may be collected by a different type of camera than the one used for production.

❌ In #NLP, data labelling may be influenced by workers’ regional context. This induces inconsistent annotation, leading to measurement bias.

Fortunately, physics, and especially metrology, give a method to detect measurement bias: calibration. It is the act of comparing measurements values with standards of known accuracy.

There are several ways to apply calibration in Machine Learning:

✅ Always compare the output of different data collection processes. To do that, use monitoring tools to assess changes of data distributions.

✅ Provide best practices and clear guidelines for your data collection process.

At Giskard, we help AI professionals detect measurement biases by enriching the modeling process with new reference points.

Integrate | Scan | Test | Automate

Giskard: Testing & evaluation framework for LLMs and AI models

Automatic LLM testing
Protect agaisnt AI risks
Evaluate RAG applications
Ensure compliance

Where do biases in ML come from? #3 📏 Measurement

Machine Learning systems are particularly sensitive to measurement bias. Calibrate your AI / ML models to avoid that risk.

In this post, we focus on one of the most important biases: measurement 📏

Data is the result of measurements that are done either by a human or a machine. Noise is inherent to every measurement. Usually, it is possible to get rid of the measurement noise by using aggregated measurement points.

Unfortunately, this technique does not really work in real ML projects. Noise is not random with respect to the event we want to predict. Put differently, measurement biases happen when the measurement noise is correlated with the target variable.

Here are some examples:

❌ In image recognition, the training data may be collected by a different type of camera than the one used for production.

❌ In #NLP, data labelling may be influenced by workers’ regional context. This induces inconsistent annotation, leading to measurement bias.

Fortunately, physics, and especially metrology, give a method to detect measurement bias: calibration. It is the act of comparing measurements values with standards of known accuracy.

There are several ways to apply calibration in Machine Learning:

✅ Always compare the output of different data collection processes. To do that, use monitoring tools to assess changes of data distributions.

✅ Provide best practices and clear guidelines for your data collection process.

At Giskard, we help AI professionals detect measurement biases by enriching the modeling process with new reference points.

Get Free Content

Download our guide and learn What the EU AI Act means for Generative AI Systems Providers.