Mean Absolute Error Explained
A common metric utilized to determine the accuracy of regression models is the Mean Absolute Error (MAE). It bases its calculation on the average size of errors in a set of predictions, excluding the direction of the mistakes. To determine the MAE, the average absolute variation between the estimated and true values is taken. The formula for computing the MAE is:
MAE = (1/n) Σ(i=1 to n) |y_i – ŷ_i|
In the above formula:
- n signifies the total number of observations in the dataset.
- y_i represents the accurate value.
- ŷ_i is the value predicted by the model.
The MAE is a linear score, indicating that every error contributes evenly to the overall mean. The MAE quantifies the magnitude of the error itself, but doesn't indicate its direction, that is, whether it's an overprediction or underprediction.
Significance of MAE
The MAE is vital for evaluating regression models due to the following reasons:
- Resistance to outliers. Unlike some metrics, such as the Mean Squared Error (MSE), the MAE isn't significantly affected by outliers or extreme values.
- Linear Score. Each error possesses an equal influence on the total MAE, facilitating performance comparison across multiple models.
- Unambiguous Interpretation. The essence of MAE is a straightforward statistic indicating the mean error size, facilitating comprehension even by non-technical stakeholders.
- Compatible Units. The MAE is expressed in the same units as the response variable, assisting in understanding the prediction error's scope.
- Diversity of Application. The MAE finds usage in a variety of fields including finance, engineering, and meteorology, establishing itself as a standard metric in some areas.
- Error Size Indication. The MAE ascertains the error size derived from the model, aiding in model selection and improvement.
Computing MAE Using Python
To compute the MAE in Python, the mean_absolute_error() function in the sklearn.metrics module is used. Here is a sample code to illustrate this:
from sklearn.metrics import mean_absolute_error
import numpy as np
# Define some sample data
y_actual = np.array([1, 2, 3, 4, 5])
y_estimated = np.array([1.5, 2.5, 2.8, 4.2, 4.9])
# Compute the MAE
mae = mean_absolute_error(y_actual, y_estimated)
print("Mean Absolute Error:", mae)
In this example, y_actual represents the actual values, while y_estimated represents the predicted values. You would need the Scikit-learn package installed on your Python environment to run this code. The mean_absolute_error() function can also handle multi-output problems, where the first argument is an array of real values, while the second is an array of predicted values.