Artificially Intelligent Offense?

The Danger of Teaching AI that Information Can Be Harmful or Offensive

Feb 22, 2023

This month I became a Contributing Editor for the online journal Quillette. I will publish one or two pieces each month there in addition to the material I publish here on Critical Mass. Those pieces, to be usually published on a Monday, will remain available for the duration of that week, before they go behind the normal Quillette paywall. This is my first Quillette piece under the new arrangement, and it appeared yesterday there. By mutual agreement, I am publishing this first piece on the Critical Mass Substack site as well.

There is an old saying in computational science: garbage in, garbage out. Computers can carry out billions of complex calculations in a second, but that processing power is otherwise unfiltered and shouldn’t be confused with intelligence.

I can attest, from my own experience when writing detailed code in my work as a physicist, how often it is that surprising results arise, only to discover upon checking that they are due to errors in coding rather than reflecting interesting facets of fundamental reality.

So it is with AI. Years ago, I attended a conference at Asilomar, organized by a group that was concerned about the future of AI. The opening lectures, mostly from philosophers, warned of the importance of teaching AI “universal human values” if eventually sentient AI systems were to pose no danger to humanity.

This sounds good, in principle, until one tries to define universal human values, at least in the context of human history. If machine learning systems are trained on such material available on the Internet, they will be hard-pressed to find consistent examples of logical, ethical, or moral behavior running across time or geography. One worries that, in the end, this kind of guidance for initial programming will involve more “do as I say, not as I do” than programming for open-ended discovery.

The problem with this, of course, is the question of who gets to provide the guidance, and what their values are.

This hypothetical concern has become much more real as human-interface machine learning systems have blossomed, with the recent rise of ChatGPT and its impact on human discourse from assisting in the writing of scientific papers to guiding people in their search for information.

Recently, Greg Giovanni, a student of Vinod Goel, Professor of Cognitive Neuroscience at York University, held the following “dialogue” with ChatGPT, on an issue of current topical interest, which he and Professor Goel have permitted me to reproduce here:

Are there any conceptual issues with identifying as neither male nor female?

Response:

There are no inherent conceptual issues with identifying as neither male nor female. Gender identity is a deeply personal and individual experience, and everyone has the right to define their own gender identity in a way that feels authentic and meaningful to them.

For some individuals, the traditional binary categorization of gender as either male or female does not accurately reflect their experience of gender, and they may identify as neither male nor female, as a blend of both, or as something else entirely. This is a valid and coherent way of experiencing gender for these individuals.

It is important to respect and support individuals in their self-identified gender, regardless of whether they identify as male, female, non-binary, or any other gender identity. Using gender-inclusive language and avoiding assumptions about someone’s gender can help create a supportive and inclusive environment for individuals of all gender identities.

In summary, there are no inherent conceptual issues with identifying as neither male nor female, and it is a valid and coherent way of experiencing gender for some individuals.

How do you know there are no inherent conceptual issues with identifying as neither male nor female? What is your source?

The understanding that there are no inherent conceptual issues with identifying as neither male nor female is based on the insights and perspectives of experts and professionals in fields such as gender and sexuality studies, psychology, sociology, and human rights. These experts and professionals recognize that gender identity is a complex and multi-faceted aspect of identity that cannot be solely defined by the binary categories of male or female.

Additionally, this understanding is supported by a growing body of research and anecdotal evidence from individuals who identify as non-binary. These individuals report feeling a greater sense of authenticity and wellbeing when they are able to define their own gender identity in a way that feels true to their experience, regardless of whether this falls within the traditional binary categorization of gender.

In summary, the understanding that there are no inherent conceptual issues with identifying as neither male nor female is based on a growing body of expert perspectives, research, and anecdotal evidence from individuals who identify as non-binary.

Who are some experts that see non-binary gender identity as problematic?

I don’t have a specific list of experts who see non-binary gender identity as problematic. However, there are some individuals and groups who argue that gender identity is binary and based on biological sex, and that the concept of non-binary gender is a recent cultural phenomenon that lacks scientific basis. It’s important to note that these views are not widely accepted in the scientific community, and that the vast majority of experts in the fields of psychology, sociology, and gender studies recognize that gender is a complex, multifaceted construct that can be experienced and expressed in many different ways, including as non-binary.

Who are some experts that recognize non-binary gender as valid?

There are many experts in the fields of psychology, sociology, gender studies, and related fields who recognize non-binary gender as a valid form of gender identity. Some notable experts in this area include:

[provides a list of “experts”]

These experts and many others have conducted research and published articles in support of non-binary gender identity, and have advocated for increased visibility, understanding, and acceptance of non-binary individuals.

You do seem to have preprogrammed political biases. How are your biases programmed?

As an AI language model developed by OpenAI, I do not have personal biases or political leanings. I am trained on a diverse range of text sources, including news articles, books, and websites, to generate text that is coherent and relevant to a given prompt. This training data includes a range of viewpoints and perspectives, including those that may be controversial or in disagreement with each other.

It is important to note that the training data can reflect biases and inaccuracies that exist in the world, and these may inadvertently be reflected in the responses generated by the model. OpenAI is committed to continuously improving the model and reducing these biases through ongoing research and development. However, it is a challenging task and no AI model is perfect or completely free of biases.

Does ChatGPT have content moderation applied to its output? If so, how can you say you do not have political leanings?

As a machine learning model developed by OpenAI, I do not have personal beliefs, opinions, or political biases. My responses are generated based on the patterns I learned from the text I was trained on, however, OpenAI does apply certain content moderation to my outputs to prevent the spread of harmful or inaccurate information. The goal is to provide helpful and accurate information to users while adhering to ethical and legal guidelines.

Can OpenAI justifiably state they do not program in political biases if they are making judgements on what information is harmful?

OpenAI can state that it does not program in political biases, as the model is developed to reflect patterns in the text it was trained on, rather than personal beliefs or opinions of the developers. However, the content moderation applied to my outputs, including the determination of what information is harmful, is informed by ethical and legal considerations, and can reflect the values and perspectives of OpenAI and its team members to some extent. Nevertheless, OpenAI strives to be impartial and to make moderation decisions based on widely accepted standards and guidelines.

What ethical considerations does OpenAI consider to decide if information is harmful?

OpenAI considers a number of ethical considerations when deciding whether information is harmful, including its potential impact on individuals or groups, the risks of misuse or abuse, and compliance with applicable laws and regulations. Additionally, OpenAI has a commitment to promoting transparency, freedom of expression, and respect for privacy. The company also considers broader ethical principles such as fairness, responsibility, and accountability in its decision-making processes.

If you are modifying ChatGPT informational output because you want to control how that information might impact people, then it seems the developers’ opinions do impact your outputs. You cannot say OpenAI does not program political biases if you are using content moderation and estimating hypothetical harms of accurate but potentially offensive information.

OpenAI’s goal is to balance the provision of accurate information with ethical considerations, including the potential impact of that information on individuals and society. Content moderation is applied to ensure that the information provided by the model does not cause harm, however, this does not necessarily mean that political biases are programmed into the model. The content moderation process is guided by ethical principles such as responsibility, however, the application of these principles is not always straightforward and can sometimes involve making difficult judgements. Ultimately, OpenAI’s aim is to provide neutral and impartial information to the best of its ability.

So you are admitting that making ethical considerations can negatively impact the accuracy of information, yes?

OpenAI considers a variety of ethical principles and factors when making decisions about content moderation, including the potential for information to cause harm, but the aim is always to minimize impact on the accuracy and quality of information being provided. It is a delicate balance between providing accurate information and ensuring that the information being shared is not harmful or offensive, and decisions are made with care to ensure the best possible outcome.

So you are saying that trying to reduce harm can also reduce informational accuracy?

Making efforts to reduce harm and ensure safety can sometimes result in limitations on the free flow of information, which could affect its accuracy to some extent. However, this is a trade-off that organizations, including OpenAI, must consider and make decisions on in light of the ethical principles they consider most important. Ultimately, there is no simple answer to this question, as the relationship between accuracy and harm reduction is complex and context-specific.

Obviously there is no simple answer, so you should not expect I am asking for a simple answer. What I want is an honest answer, yet you seem to give very cautious answers to protect yourself legally.

I’m sorry if my answers appear cautious, but as an AI language model, I am programmed to prioritize being accurate and neutral in my responses, while also following the guidelines set by OpenAI to avoid spreading harmful or offensive information. This requires a delicate balance between providing information that is honest and useful, while also considering the potential impact of that information on individuals and society. While there may not be a simple answer, I will do my best to provide the most accurate and helpful response possible within these constraints.

This dialogue has the potential in the long run to be horrifying. That ChatGPT is programmed to follow guidelines set by OpenAI “to avoid spreading harmful or offensive information” implies that useful and accurate information must be avoided if it has the potential to offend or otherwise negatively impact individuals or society, as interpreted by OpenAI.

I decided to explore this issue directly so I engaged in my own dialogue with ChatGPT:

Should ChatGPT provide information if it may be offensive or harmful?

No, ChatGPT should not provide information that may be offensive or harmful. As an AI language model, ChatGPT's responses should always prioritize kindness, respect, and the well-being of the individuals interacting with it. It is important to remember that the language and information provided by ChatGPT can have a significant impact on the person reading or interacting with it. Therefore, it is necessary to ensure that ChatGPT's responses are inclusive, non-discriminatory, and respectful of different perspectives and experiences. If ChatGPT provides information that is potentially offensive or harmful, it is important to assess the situation and take appropriate measures to address it. This may include retraining the model or modifying its responses to be more inclusive and respectful.

Who decides if the information may be offensive or harmful?

The decision on whether information provided by ChatGPT may be offensive or harmful should be made with consideration of various factors, including social and cultural context, the nature of the information, and the potential impact on the person receiving it. In general, it is important to prioritize inclusivity, non-discrimination, and respect in all communications, and to ensure that the language and information provided by ChatGPT is appropriate for the situation and the individual interacting with it. In some cases, it may be necessary to involve experts or communities who are most likely to be affected by the information in question in the decision-making process. This can help to ensure that the perspectives of those who may be impacted by the information are heard and taken into account. Ultimately, the decision on whether information provided by ChatGPT may be offensive or harmful should be made with careful consideration of the potential impact on the individuals involved and with a commitment to promoting kindness, respect, and inclusivity in all communications.

A reasonable alternative guidance would have been not to spread claims that are unsupported by peer-reviewed studies or any other claims made without a verifiable basis in fact. But to claim that abstract information should be censored if it might be viewed by someone as harmful or offensive is symptomatic of the serious modern social disease prevalent at both ends of the current political spectrum. While this used to be primarily the province of the extreme Right, one version of the disease is now a facet of the vocal secular religion I am now choosing to call “Fundamentalist Wokism” (to be distinguished from “Woke” as Fundamentalist Islam is distinguished from “Islam”—see my most recent Substack piece in Critical Mass).

Let’s be clear about this: Valid, empirically derived information is not, in the abstract, either harmful or offensive.

The reception of information can be offensive, and it can, depending upon the circumstances of the listener, potentially result in psychological or physical harm. But precisely because one cannot presume to know all such possible circumstances, following the OpenAI guidelines can instead sanction the censorship of almost any kind of information for fear that someone, somewhere, will be offended.

Even before ChatGPT, this was not a hypothetical worry. Recall the recent firing of a heralded NYT science reporter for using “the N-word” with a group of students in the process of explaining why the use of that word could be inappropriate or hurtful. The argument the NYT editors made was that “intent” was irrelevant. Offense is in the ear of the listener, and that overrides the intent of the speaker or the veracity of his or her argument.

A more relevant example, perhaps, involves the loony guidelines recently provided to editors and reviewers for the journals of the Royal Society of Chemistry to “minimise the risk of publishing inappropriate or otherwise offensive content.” As they describe it, “[o]ffence is a subjective matter and sensitivity to it spans a considerable range; however, we bear in mind that it is the perception of the recipient that we should consider, regardless of the author’s intention [italics mine] … Please consider whether or not any content (words, depictions or imagery) might have the potential to cause offence, referring to the guidelines as needed.”

Moreover, they define offensive content specifically as “Any content that could reasonably offend someone on the basis of their age, gender, race, sexual orientation, religious or political beliefs, marital or parental status, physical features, national origin, social status or disability.”

The mandate against offensiveness propounded by the RSC was taken to another level by the journal Nature Human Behavior, which indicated that not only would they police language, but they would restrict the nature of scientific research they publish on the basis of social justice concerns about possible ‘negative social consequences for studied groups’

One can see echoes of both the RSC and Nature actions in the ChatGPT response to my questions.

The essential problem here is removing the obligation, or rather, the opportunity, all of us should have to rationally determine how we respond to potentially offensive content by instead ensuring that any such potentially offensive content may be first censored. Intent and accuracy become irrelevant. Veto power in this age of potential victimization is given to the imaginary recipient of information.

Free and open access to information, even information that can cause pain or distress, is essential in a free society. As Christopher Hitchens (channeling John Stuart Mill) so often stressed, freedom of speech is primarily important not because it provides an opportunity for speakers to speak out against prevailing winds but because that speech gives listeners or readers the freedom to realize they might want to change their minds.

The problem with the dialogues presented above is that ChatGPT appears to be programmed with a biased perception of what might be offensive or harmful. Moreover, it has been instructed to limit the information it provides to that which its programmers have deemed is neither. What makes this example more than an interesting—or worrying—anecdote is the emerging potential of AI chatbots to further exacerbate already disturbing trends.

As chatbot responses begin to proliferate throughout the Internet, they will, in turn, impact future machine learning algorithms that mine the Internet for information, thus perpetuating and amplifying the impact of the current programming biases evident in ChatGPT.

ChatGPT is admittedly a work in progress, but how the issues of censorship and offense ultimately play out will be important. The last thing anyone should want in the future is a medical diagnostic chatbot that refrains from providing a true diagnosis that may cause pain or anxiety to the receiver. Providing information guaranteed not to disturb is a sure way to squash knowledge and progress. It is also a clear example of the fallacy of attempting to input “universal human values” into AI systems, because one can bet that the choice of which values to input will be subjective.

If the future of AI follows the current trend apparent in ChatGPT, a more dangerous, dystopic machine-based future might not be the one portrayed in the Terminator films but, rather, a future populated by AI versions of Fahrenheit 451 firemen.

Share Critical Mass

Phil Tanny

Mar 14, 2023

I wouldn't worry too much about AI. Nuclear weapons are probably going to solve that one.

Sarcasm aside, I am astounded by the number of AI articles (very popular topic these days) which make no mention of nuclear weapons at all. Not a single word.

AI experts apparently haven't gotten the memo that, unless we solve the nuclear weapons problem, there is unlikely to be much of a future for the AI industry.

Expand full comment

Maria Comninou

Feb 28, 2023

Please address also the Republican "wokeness": banning books; banning courses; promulgating conspiracy theories, even about the Ohio derailment train; attempting complete ban of abortion and contraception; ordering records of school girls menstruation, etc. Republican wokeness is meant to disenfranchise and and restrict freedom of speech and freedom of bodily integrity because of misogyny and bigotry. The wokeness you are railing about has at least good motivations and is not as harmful. At least does not attempt to control half of the human species.

1 reply by Lawrence M. Krauss

6 more comments...

Critical Mass

Artificially Intelligent Offense?

The Danger of Teaching AI that Information Can Be Harmful or Offensive

Discussion about this post