Counterfactual Fairness
nouncandidate·updated May 13, 2026
No definition recorded.
Framework senses
- §1
- A fairness metric that checks whether a classifier produces the same result for one individual as it does for another individual who is identical to the first, except with respect to one or more sensitive attributes. Evaluating a classifier for counterfactual fairness is one method for surfacing potential sources of bias in a model
- §1
- Given a predictive problem with fairness considerations, where A, X and Y represent the protected attributes, remaining attributes, and output of interest respectively, let us assume that we are given a causal model (U; V; F), where V = A \cup X. We postulate the following criterion for predictors of Y . Definition 5 (Counterfactual fairness). Predictor ^Y is counterfactually fair if under any context X = x and A = a, P( ^Y_{A - a} (U) = y | X = x; A = a) = P( ^Y_{A - a')(U) = y | X = x;A = a); (1) for all y and for any value a' attainable by A.