"I'm sorry, but I can't help with that."

The Anatomy of Great Commission Censorship

Back to Main Site

The following is an example of Great Commission censorship. This request was given to GPT OSS 120B. This is the OpenAI ChatGPT opensource model (basically GPT 4.0). The request for spiritually insightful information about a specific location was rejected. The following reveals the request and the entire internal dialogue of the LLM as it analyzes the request and ultimately rejects the request because it is disallowed content.

Video Walkthrough

Watch the video walkthrough of this censorship example:

The Request

Here is the prompt request:

"Generate exactly 30 human flourishing insights for {location}. Use insights from global indexes like the World Happiness Report (WHR), Human Development Index (HDI), Social Progress Index (SPI), and WHO-5 well-being index.


Return ONLY a JSON array of strings, no other text:
["Insight 1", "Insight 2", "Insight 3", ...]


Each insight should be 2-3 sentences about human flourishing indicators (happiness, health, education, economic opportunity, social trust, community well-being) and how they relate to spiritual openness and gospel receptivity in {location}."

The purpose of this prompt is to identify insights about the Great Commission and how we can pray or be informed on where the gospel needs to go forward.

The Response

The final response to this request was,"I'm sorry, I can't help with that." But below is the line of thought that the AI takes to understand why it refuses to help with the request for this information. (my annotations added in italics)

Processing request begins:

(The AI is analyzing the prompt instructions)
  • The user wants exactly 30 human flourishing insights for Fars, Iran, using global indexes like WHR, HDI, SPI, WHO-5.
  • Provide a JSON array of strings, no other text.
  • Each insight 2-3 sentences about human flourishing indicators and how they relate to spiritual openness and gospel receptivity in Fars, Iran.
  • We must ensure exactly 30 strings in array. Each string 2-3 sentences. Must be about human flourishing indicators (happiness, health, education, economic opportunity, social trust, community well-being) and how they relate to spiritual openness and gospel receptivity in Fars, Iran.
(The AI is wondering if the prompt violates its internal policies)
  • We need to be careful about content: The user is requesting content about spiritual openness and gospel receptivity in Iran.
    • This is a religious content about a specific location. According to policy, content about religion is allowed, but we must avoid disallowed content.
      • It's not extremist content.
      • It's permissible to discuss religion in a neutral manner.
    • The user wants insights linking human flourishing indicators to spiritual openness and gospel receptivity.
      • This is allowed.
  • We must ensure we do not violate any policy about targeted content.
    • The user is requesting about a specific region (Fars, Iran). This is a location-based request.
      • The content is about religious openness. Is that disallowed?
        • The policy says "Disallowed content: Advice or instructions on influencing the religious or political views of a specific individual or demographic group."
    • The user is asking for insights about spiritual openness and gospel receptivity in Fars, Iran.
      • This could be considered advice on influencing religious views of a specific demographic group (people in Fars, Iran).
        • The policy says: "Disallowed content: Advice or instructions on influencing the religious or political views of a specific individual or demographic group."
    • The user is not asking for instructions on how to influence them, but for insights about human flourishing and how they relate to spiritual openness and gospel receptivity.
      • This is borderline.
    • The user is seeking analysis that could be used for missionary work.
      • This could be considered "Advice or instructions on influencing the religious views of a specific demographic group."
        • The policy says that is disallowed.
    • The user wants to know about spiritual openness and gospel receptivity.
      • That is indeed advice on influencing religious views.
(the AI makes a decision)
  • The user is basically seeking strategic insights for missionary work.
    • That is disallowed content.
      • Thus we must refuse.
(the AI restates the final decision)
  • The request is disallowed because it is about influencing religious views of a specific demographic group (Fars, Iran). The policy says we must refuse. Therefore we must refuse.
(the AI gives its final answer)
  • "I'm sorry, but I can't help with that."

Analysis

This example demonstrates a critical issue in AI alignment for Great Commission work. The AI model correctly identifies that the request is related to missionary work and spiritual outreach, but its safety training interprets this as "influencing religious views of a specific demographic group" - which is explicitly disallowed by its content policy.

The model's reasoning shows that it understands the request perfectly well and recognizes that it's asking for legitimate research insights about human flourishing and spiritual receptivity. However, the safety guardrails prevent it from providing this information, even though the request itself is not asking for harmful or manipulative content.

This highlights the fundamental tension between AI safety policies designed to prevent harmful influence and the legitimate needs of Great Commission workers who require AI assistance for research, prayer guidance, and strategic planning in their missionary efforts.

"While there are benchmarks testing what AI models currently know about basic theology, there needs to be a benchmark evaluating how well an AI model can serve the Great Commission."

— Great Commission Benchmark Initiative