"Evaluating the Accuracy of GPT Zero for AI Generated Text Detection in Education"

AI in Education
31 Jan 202324:49

TLDRIn this experiment, the presenter tests the accuracy of GPT Zero, an AI detection tool, by submitting various AI-generated texts, including a hip-hop song, a sonnet, a poem, a commentary, and a PowerPoint outline, to see if it can correctly identify machine-written content. The results are mixed, with GPT Zero struggling to detect creative writing but performing better with more structured texts. The test also explores the potential for grammar-altering tools to fool the AI detector, raising concerns about its reliability in academic integrity assessments.


  • 😀 The experiment aims to evaluate GPT Zero's effectiveness in detecting AI-generated text in various contexts.
  • 🔍 GPT Zero is a tool developed by a computer science student to identify text written by artificial intelligence.
  • 🎤 The first test involved asking GPT to write a hip-hop song in the style of Drake, which GPT Zero incorrectly identified as likely human-written.
  • 🌿 The second test with a sonnet in the style of Margaret Atwood was also not detected as AI-generated by GPT Zero.
  • 🌍 A 500-word poem about climate change in the style of Pablo Neruda was mistaken for human writing by GPT Zero.
  • 📜 GPT Zero successfully identified a commentary on a poem as AI-generated, showing it can detect more academic writing.
  • 📊 When the AI-generated commentary was turned into PowerPoint slides, GPT Zero did not identify it as AI-written, indicating potential limitations.
  • 📝 An essay on climate change was correctly identified as AI-generated, but modifying the text with a grammar tool confused GPT Zero.
  • 🤔 A complex test simulating a student response in an online forum was partially identified as AI-written, suggesting mixed results.
  • 📑 The transcript includes a historical speech from MP Bhutan Suite, which GPT Zero incorrectly identified as AI-generated, highlighting possible inaccuracies in detection.
  • 🚫 The experimenter expresses hesitancy in using GPT Zero for academic integrity due to the risk of false positives and inaccuracies.

Q & A

  • What is the purpose of GPT Zero and who created it?

    -GPT Zero is a tool designed to detect whether text was written by an artificial intelligence. It was created by a young computer science student from an Ivy League university.

  • What types of text were used in the experiment to test GPT Zero's accuracy?

    -The experiment used a variety of text types including a hip-hop song, a sonnet, a poem, a commentary on a poem, a PowerPoint suggestion, and a discussion forum posting.

  • How did GPT Zero perform in detecting the hip-hop song written in the style of Drake about academic integrity?

    -GPT Zero incorrectly identified the hip-hop song as most likely human-written, suggesting it failed to detect the AI-generated nature of the text.

  • What was the result when GPT Zero was tested with a sonnet written in the style of Margaret Atwood?

    -GPT Zero identified the sonnet as likely written entirely by a human, not detecting it as AI-generated text.

  • How did GPT Zero perform on the longer 500-word poem about climate change in the style of Pablo Neruda?

    -GPT Zero failed to identify the poem as AI-generated, suggesting it was likely written entirely by a human.

  • What was the outcome when GPT Zero was used to analyze a commentary on a poem discussing style and rhythm?

    -GPT Zero successfully identified the commentary as written entirely by AI, showing it was better at detecting this type of academic writing.

  • Why might GPT Zero have difficulty detecting AI-generated creative writing?

    -GPT Zero may struggle with creative writing because it might not have enough patterns or 'tell-tale' signs in the text to identify as AI-generated, unlike more structured academic writing.

  • What happened when the AI-generated text was put through a grammar-changing tool like Spinbot?

    -After putting the AI-generated text through Spinbot, GPT Zero became confused and incorrectly identified the text as likely written by a human, suggesting the tool can potentially fool GPT Zero.

  • How did GPT Zero handle the task of identifying an AI-generated response to an online discussion forum post?

    -GPT Zero identified parts of the AI-generated response as likely written by AI, but some parts were unclear, indicating a mixed result in detecting AI writing in a discussion forum context.

  • What was the surprising result when GPT Zero analyzed a quote from an MP's speech given in 2016?

    -Surprisingly, GPT Zero identified the 2016 speech by MP Bhutan Suite as entirely written by AI, which is unlikely given that sophisticated AI tools were not available at that time.

  • Based on the experiment, what is the conclusion about using GPT Zero for detecting academic integrity issues?

    -The experiment suggests that using GPT Zero to detect academic integrity issues might not be reliable due to potential false positives and the possibility of being fooled by grammar-changing tools.



🔍 Testing GPT's AI Detection Capabilities

The video script discusses an experiment to test GPT0, a program designed to detect AI-generated text. The experimenter uses various prompts to generate content from Chat GPT, including a hip-hop song, a sonnet, a poem, a commentary, and a PowerPoint outline. Each generated text is then analyzed by GPT0 to determine its authenticity. The experiment aims to assess GPT0's effectiveness in distinguishing between human and AI-written text.


🎤 GPT's Attempt at Creative Writing

The experimenter first asks Chat GPT to write a hip-hop song about academic integrity in the style of Drake. The generated song is then tested with GPT0, which surprisingly identifies it as likely human-written. This section of the script highlights the potential limitations of GPT0 in detecting creative writing, as it fails to recognize the AI origin of the song.


🌿 Sonnet and Poem Analysis with GPT0

Continuing the experiment, the script describes the generation of a sonnet about nature in the style of Margaret Atwood and a 500-word poem about climate change in the style of Pablo Neruda. Both texts are analyzed by GPT0, which again identifies them as likely human-written. This part of the script emphasizes GPT0's challenges in accurately detecting AI-generated creative content.


📚 Scholarly Writing and PowerPoint Detection

The script then moves on to the generation of a scholarly commentary on a poem and a PowerPoint outline. When these are analyzed by GPT0, it correctly identifies the commentary as AI-written but mistakenly identifies the PowerPoint slides as human-written. This section explores the nuances of GPT0's detection capabilities in more structured and academic writing.


🌡️ Climate Change Essay and Grammar Spinning

The experimenter requests Chat GPT to write an essay on the dangers of climate change in Vancouver, BC. GPT0 correctly identifies the essay as AI-written. However, when the essay is processed through a grammar-spinning tool and re-analyzed, GPT0 is confused and identifies it as human-written, demonstrating the potential to evade detection by altering the text's structure.

💬 Simulating Student Discussion Forum Responses

In the final part of the script, the experimenter asks Chat GPT to simulate a student response in an online discussion forum, addressing a debate on gender expression. GPT0 identifies parts of the response as AI-written, but some sections remain unclear. This test showcases the complexity of detecting AI-generated text in interactive and nuanced contexts.

🤖 Reflections on GPT0's Detection Accuracy

The script concludes with the experimenter's reflections on GPT0's performance. It highlights the mixed results of the tests, noting GPT0's difficulty with creative writing but better performance with structured essays. The experimenter expresses hesitation in using GPT0 for academic integrity due to the risk of false positives and the potential for evasion through text manipulation.



💡GPT Zero

GPT Zero is a detection tool designed to identify whether a given text has been generated by artificial intelligence. In the video, it is used to test various AI-generated texts to see if they can be accurately detected as non-human. The tool's effectiveness is a central theme, as it is put to the test with different writing prompts and styles, such as hip-hop lyrics and sonnets.

💡AI-generated Text

AI-generated text refers to written content created by artificial intelligence algorithms, rather than by human authors. The video script discusses the use of AI to generate various forms of text, including songs, poems, commentaries, and essays, and then evaluates whether GPT Zero can correctly identify these as machine-written.

💡Hip-hop Song

A hip-hop song is a musical composition characterized by rapping, a vocal style where the artist speaks rhythmically and in rhyme. In the script, the AI is prompted to write a hip-hop song about academic integrity in the style of Drake, which is then tested by GPT Zero to see if it can detect the AI origin.


A sonnet is a 14-line poem with a specific rhyme scheme, traditionally used to express love or other deep emotions. In the video, the AI is asked to write a sonnet about nature in the voice of Margaret Atwood, a renowned author, which is part of the experiment to evaluate GPT Zero's detection capabilities.

💡Climate Change

Climate change refers to long-term shifts in global or regional climate patterns. The AI is tasked with writing a poem and an essay on this topic, highlighting the environmental issue's significance. The essays and poems are then analyzed by GPT Zero to determine if they were AI-generated.

💡Academic Integrity

Academic integrity is the concept of honesty and trustworthiness in academic settings, such as avoiding plagiarism and cheating. The video script includes a prompt for the AI to write a hip-hop song about this concept, emphasizing the importance of original work in education.


Plagiarism is the act of using another person's work or ideas without giving proper credit, which is considered unethical in academic and creative fields. The script mentions plagiarism in the context of the AI-generated hip-hop song about academic integrity, where the lyrics warn against this practice.


In the context of language models, perplexity is a measure of how well the model predicts a sample of text. GPT Zero uses perplexity, among other metrics, to assess whether a text is likely written by a human or an AI. The script discusses sentences with low perplexity as potential indicators of AI authorship.


Burstiness, in the context of text analysis, refers to the occurrence of unexpected or atypical phrases that could indicate non-human writing. GPT Zero considers burstiness as a factor in its evaluation of text, as mentioned in the script when analyzing the AI-generated content.


Spinbot is a term that refers to a tool or service that rephrases or 'spins' existing text to create new versions with different wording while retaining the original meaning. In the script, the essay generated by the AI is put through Spinbot to alter its structure, which then affects GPT Zero's ability to detect AI authorship.

💡Discussion Forum

A discussion forum is an online platform where people can exchange ideas and comments on a particular topic. The script includes an experiment where the AI is asked to generate a response to a forum post about gender expression and the Human Rights Act, simulating a student's contribution to an academic debate.


Introduction of an experiment to evaluate GPT Zero's accuracy in detecting AI-generated text.

GPT Zero was designed by a computer science student to detect AI-written text and has been recently optimized.

The experiment includes prompts for various text types, including a hip-hop song, a sonnet, a poem, a commentary, and a discussion forum post.

First test involves writing a hip-hop song about academic integrity in the style of Drake.

GPT Zero's initial test result suggests the hip-hop song was likely human-written.

Second test with a sonnet written in the style of Margaret Atwood, identified as likely human-written by GPT Zero.

A 500-word poem about climate change in the style of Pablo Neruda was not detected as AI-written by GPT Zero.

GPT Zero successfully identified a machine-written commentary on a poem discussing style and rhythm.

A request for a PowerPoint format was not identified as AI-written by GPT Zero, despite being generated by Chat GPT.

An essay on the dangers of climate change in Vancouver BC was correctly identified as AI-written.

Using a grammar spinning tool can potentially confuse GPT Zero's detection capabilities.

GPT Zero's mixed results in detecting AI-written text in various formats and styles.

The experiment suggests that GPT Zero might not be fully reliable for detecting creative writing.

GPT Zero's performance varies depending on the type of text and its complexity.

The potential for false positives in using GPT Zero for academic integrity purposes is highlighted.

The experiment concludes with a discussion on the limitations and potential misuse of GPT Zero in education.

Unexpected result: GPT Zero identified a human MP's speech as AI-written, suggesting possible inaccuracies in detection.

Final thoughts on the cautious use of GPT Zero in academic settings due to its inconsistent performance.