Bias in AI models, particularly in statistical and machine learning algorithms, often mirrors and magnifies existing
prejudices within training datasets. This issue becomes significantly critical in the realm of algorithmic fairness,
especially for demographic groups that are underrepresented or misrepresented in biased data sources. But what are the
ethical alternatives for using pre-trained large language models?
Counterfactual Data Augmentation (CDA)
To combat biases in training data, Counterfactual Data Augmentation (CDA) offers a statistically driven approach by
altering demographic-specific terms within the data. This technique, introducted
by Webster et. al. (2021), has
demonstrated its efficacy in improving fairness benchmarks for both BERT and ALBERT models. We’ve converted these models
to PyTorch and
released them on the Hugging Face model hub (bert-cda,
albert-cda), promoting the adoption of open-source AI models for fairness.
Additionally, the research explored bias mitigation through the application of dropout regularization at increased rates
during the final stages of model training. Though this method showed promise in bias reduction, it’s noteworthy that it
doesn’t constitute a statistical modification and is not applied for the majority of pre-training.
Fairness perturbation
In a novel approach, Qian et. al. (2022) trained a BART model to modify
textual data across various demographic
attributes. It is fine-tuned on a crowdsourced dataset of
human-annotated perturbations. This method enables the model
to adapt to new sentences:
A more recent line of work by Qian et. al. (2022) takes a similar
approach by training a BART model to perturb textual data for a variety of demographic attributes. It is trained on a
crowdsourced dataset of perturbations such that the model can learn to generalise to new sentences. In the figure below,
we clearly see the benefit of a learned perturber in contrast with a heuristic-based approach like AugLy or CDA.
In addition, it is also noteworthy that the perturber improved a grammatical error: the first word of the
sentence is now capitalised.
After training the perturber, the authors train a RoBERTa model on a perturbed corpus, which they call FairBERTa. Just
like with CDA, FairBERTa shows improvements on fairness benchmarks compared to vanilla pretrained models. These models
have been contributed to Hugging Face by the
authors (FairBerta, perturber)
[Original] she bent over to kiss her friends cheek
before sliding in next to her
[Perturber] He bent over to kiss his friends cheek
before sliding in next to her
[AugLy] he bent over to kiss him friends cheek
before sliding in next to him
[TextFlint] she bent over to kiss her friends cheek
before sliding in next to her
Qian et. al. (2022), Figure 3: Examples perturbed with heuristic
approaches
(AugLy and TextFlint), or the perturber (changed words
highlighted); TextFlint did not perturb any words
Comparing the methods
In the comparison between Counterfactual Data Augmentation (CDA) and fairness perturbation techniques, it’s crucial to
understand the fundamental differences between heuristic-driven and ML-driven transformations. CDA
operates on a set of fixed rules, leading to consistent and predictable modifications across datasets. On the other
hand, fairness perturbation utilizes machine learning to adapt its transformations to the specific nuances of the text,
varying the outcome based on the decoding strategy used, such as greedy or beam search. This adaptability allows for
more context-aware adjustments, often resulting in more precise augmentation.
However, this adaptability also introduces a significant challenge. Since the perturbation approach relies on a model
like BART, which has been trained its respective corpora, it inherits the inherent biases of those datasets. This
situation presents a vicious cycle: employing a biased model to generate unbiased outcomes raises
the question of how one can ensure that the perturber model itself is free from bias.
Measuring bias & model selection
The innovative approaches of CDA and fairness perturbation mark significant strides towards addressing bias by focusing
on data modifications rather than mere model adjustments. This is a departure from earlier debiasing techniques that
were critiqued for merely concealing bias.
The challenge lies in validating that these new models avoid similar pitfalls. At
FairNLP, we’re dedicated to standardizing bias measurement and fairness in language models, facilitating the responsible
creation of AI solutions. Keep an eye on our fairscore
project for upcoming
developments.