Asessing Toxicity Bias in Language Models
Finalist in the Stanford HAI Audit Challenge using machine learning to measure toxicity
Language is ubiquitous to the modern online experience. In particular, there has been immense progress on large language models (LLMs) over the past few years: LLMs have demonstrated amazing capabilities across a broad range of tasks and have been referred to as "foundation models". However, LLMs can also be harmful to online conversations without proper guardrails in place as they can generate toxic texts, reflect human biases, as well as introduce its own biases in the training process. Left unchecked, this toxicity can harm consumers and their freedom to non-discrimination. Tobias is a tool for measuring and exploring toxicity and toxicity bias in language models so that regulators, industry, and the public can minimize discrimination in current and future technology.