Overcoming Bias in Generative Systems
This is a reprinted article from my LinkedIn account. I’ll be publishing articles in parallel between this site and LinkedIn.
Generative systems such as language models and image synthesis are revolutionizing various fields such as text generation, image creation, and speech synthesis. However, these systems are not immune to bias. Bias can arise in generative systems from multiple sources, including the data used to train the model, the algorithms used to create the model, and the specific prompts given to the model.
How does bias manifest in generative systems?
Bias can take many forms, such as promoting stereotypes, discriminating against certain groups, and producing incorrect or misleading outcomes. For instance, a language model trained on a text corpus that primarily features male writers may generate biased results that portray women as submissive or inferior.
In generative image systems, incomplete instructions can create results that reinforce the biases ingrained in the generative datasets. For instance, simply entering the word "woman" in a generative image model will most often generate an image of a white European woman. This is a problematic issue that I'll discuss in more detail later in this article, but it's also a problem that can be easily overcome. In fact, working around these issues highlights a key strength of these systems: namely, they can produce highly specific results if given precise instructions.
To avoid bias in generative systems, it is essential to use specific prompt language that explicitly instructs the model to generate outputs that are inclusive and accurate, avoiding biased language.
By adding an ethnicity parameter to the base prompt (African American Woman, Indian Woman, Native American Woman, Japanese Woman), these images were generated.
Although the ethnicity parameter addresses the issue of limited diversity in the initial outcome, some might argue that the resulting images above perpetuate stereotypes to some extent, by depicting highly traditional subjects, and thus reinforcing a different kind of bias.
However, this further underscores the importance of providing specific instructions for the desired output.
These images have undergone two additional iterations, with the inclusion of the labels "professional" and "age 35". As a result, the outputs have become less culturally stereotypical. However, the resulting outputs still appear somewhat generic - rather like business headshots.
In this last set of images, we've taken the differentiation one step further by providing contextual scenarios for our subjects, such as a boardroom meeting, construction site, political rally, and restaurant kitchen.
So, you can see that by providing greater detail, you can easily overcome the biases inherent in baseline datasets. This has the advantage of providing a richer, more aspirational end product.
To overcome bias in generative systems, we need to employ a multifaceted approach. Using specific prompt language that instructs the model to avoid biased language and generate inclusive and accurate outputs can help mitigate bias and ensure fairness, reliability, and equity.
Addressing Structural Bias in Datasets
Moving on to the larger issue, we need to address the structural bias inherent in many datasets. This dataset training process is often described as being driven solely by machine learning. In fact, humans guide much of the meta-tagging process during the dataset training stage. This is, of course, why these datasets exhibit these sorts of biases.
It's also crucial to distinguish the type of bias being described here. It is less explicit or malicious bias, than it is bias by omission, or implicit bias. The humans guiding the process create a default baseline of descriptors, with the default descriptor "woman" being a white woman and "African American Woman" not being a white woman.
In other words, human beings aren't trying to be exclusionary, they just aren't considering it as an important aspect of the process.
Solving Bias in Generative Systems
To solve for the biases inherent in generative systems, we can focus on two main areas, input, and output.
When addressing bias on the input end, we need to take a much closer look at the dataset training process. In the example I used above, "woman" should never be the sole descriptor of a training image. There should always be more and greater detail in any image tag.
For bias on the output end, we have similar options. Generative platforms filter for all kinds of potentially problematic content, be it violent, overtly sexual, or otherwise exploitative.
For example, in Midjourney, the terms "Cronenberg" (as in the body-horror film director David Cronenberg), and the photographer "Helmut Lang" (who photographed a lot of nudes) are verboten, as is the term "Giant Cockroach" (I'll leave it you, dear reader, to parse that one).
The point is that generative platforms can, and do, filter for all sorts of content that could be potentially problematic. Therefore, they could also add additional logic to their filtering processes that explicitly addresses, and solves, the biases on the input side.
"Woman" alone should not be a valid input prompt. Generative platforms can and should require adequate detail from prompts to produce an unbiased result.
Conclusion
In conclusion, generative systems are revolutionizing various fields, but they are not immune to bias. Bias can arise from multiple sources, including the data used to train the model, the algorithms used to create the model, and the specific prompts given to the model. To overcome bias, it is essential to use specific prompt language that explicitly instructs the model to generate outputs that are inclusive and accurate, avoiding biased language. Additionally, addressing bias in the datasets is critical to ensure that generative systems are fair, reliable, and equitable. By focusing on input and output ends and requiring adequate detail from prompts, generative platforms can mitigate the impact of bias and produce unbiased results. Overall, a multifaceted approach is necessary to overcome bias in generative systems, and we must remain vigilant in addressing this issue to ensure that the technology continues to benefit society.
About the Author
Arlan Smith is a creative executive with extensive experience in design-led brand and marketing creative. He began his career in entertainment, before moving into brand and marketing work for tech companies. Previously, he was the Director of Creative Services at the leading global streaming platform for Japanese Anime, Crunchyroll.
Over the last nine months, Arlan has been deeply immersed in the world of generative AI, researching the capabilities, applications, and ethics of this exciting and transformative technology.
You can view his AI artwork at: instagram.com/serpenticaldi
Arlan's portfolio: www.arlansmith.com