Post

Cleaning AI Generated Text

A tool to detect and clean text that might have been generated by AI

AI Is Infecting Text

Cleaning AI Generated Text

AI-generated text can often contain subtle indicators of its origin. One of the most common signs is the use of em dashes, but other non-typable characters can also slip through. This script is designed to clean such text by replacing these characters with more standard alternatives.

The script is designed to search for non ASCII charecters (things that may not be on your keyboard), it will allow most ranges of emoji.

Try It Out

Input Text

Cleaned Text

What Does This Cleaner Do?

  • Converts smart quotes to straight quotes (' and "' and ")
  • Removes unusual Unicode characters
  • Converts em dashes and en dashes to regular dashes
  • Removes non-breaking spaces
  • Preserves standard ASCII characters, emojis, and basic formatting
  • Removes extra spaces while keeping line formatting

Why This Matters for AI Detection

When students or writers submit work, unusual Unicode characters (like fancy quotes or invisible characters) may indicate text was copied from AI tools like ChatGPT. This tool makes such characters visible and removes them.

This post is licensed under CC BY 4.0 by the author.