As the world accelerates towards digitization,
natural language generation (NLG) is becom ing a
critical ingredient of common AI systems like Amazon’s
Alexa and Apple’s Siri. How- ever, many recent studies
have highlighted that machine learning models employed
in NLG often inherit and amplify the societal biases in
data – including gender bias. This paper aims to achieve
gender parity in natural lan- guage models by analyzing
and mitigating gen- der bias. An open-source corpus has
been used to train and fine-tune the GPT-2 model, following which text is generated from prompts to
investigate and mitigate the bias. Domain Adaptive Pretraining is used as the primary technique to counter the
bias and the paper evaluates its effectiveness in contrast
to other methods. Lastly, the impact of domain adaptation on the performance of the natural lan- guage
model is looked at through perplexity of the de-biased
model obtained. Through em- pirical and in-depth
assessment of gender bias, this study provides a
foundation for ameliorat- ing gender equality in the
digitalspace