Can Google Detect AI-Generated Copy? My Tests Say No.

Liz Achanta
Apr 16, 2024
7 min read

Updated: Apr 17, 2024

I was browsing LinkedIn the other day and came across a post about Google and AI. One of my former colleagues had commented that Google was getting smarter, and wasn't indexing content written by AI. The scientist in me wanted data to back this up, so naturally my next step was to search the question, "can Google tell if content is written by ai?"

The results weren't surprising, especially since the first results is an AI-generated message from Google saying "Yes, Google can detect AI-generated content and use algorithms to filter it out." I was a little skeptical of this answer; it was like asking a child if they ate a cookie and they say no (with the pieces still in their mouth). So I dug a little deeper. What did I find? The results are varied, and quite frankly, it seems like so long as you're not generating spam content, Google doesn't seem to care.

In an era where artificial intelligence (AI) is becoming increasingly sophisticated, it's natural to wonder whether tech giants like Google have the ability to detect AI-generated content, especially when it comes to blogs and other written material. The intersection of AI and content creation has raised intriguing questions about authenticity and transparency. Since I do a lot of blog and content writing for my day job, this topic was incredibly intriguing to me - especially since I do rely on AI to help me with refining my copy, creating ideas, or even translating data-rich text to something palpabaly for the common-Liz.

In this article, I'm going to dive into AI-generated content and SEO, and give you my two cents on whether the aforementioned statement about Google being able to recognize AI stands.

Understanding AI-Generated Content

AI has made remarkable strides in natural language processing (NLP) and text generation. Models like OpenAI's Chat GPT (Generative Pre-trained Transformer) or Claude are capable of producing remarkably human-like text based on the input and training data they've been exposed to. These models can generate coherent paragraphs, mimic various writing styles, and even summarize complex information (example: I had a biopsy returned to me, and I copy + pasted it into ChatGPT and asked it to explain to me - and it did!). This capability has paved the way for the use of AI in content creation, including blog writing.

When an AI writes a blog post, it typically involves feeding prompts or topics into the model, which then generates the text based on patterns and structures learned during training. The output can be highly convincing and often indistinguishable from human-written content, especially in shorter pieces or those with specific prompts (example: write me a long-form blog post about whether Google can detect whether this blog post was written by AI, and then copy+paste this blog post underneath that prompt).

Challenges in Detecting AI-Generated Content

Google and other search engines rely on various algorithms to index and rank content on the web. However, detecting AI-generated content presents unique challenges:

Naturalness of Text: Advanced AI models produce text that closely resembles human writing. They can emulate different tones (Moira Rose, anyone?), vocabularies, and styles, making it difficult to spot the difference based solely on linguistic cues.
Diverse Training Data: AI models are trained on vast datasets sourced from the internet, encompassing diverse writing styles and genres. This diversity contributes to the AI's versatility in generating content that mirrors human expression.
Evolving AI Technology: AI algorithms continue to evolve rapidly, with newer models achieving higher levels of sophistication and realism. This constant evolution complicates the task of developing static detection methods.
Lack of Clear Patterns: Unlike plagiarized content, AI-generated text isn't copied from existing sources, making it challenging to identify through conventional plagiarism checks.

Google's Approach to Content Evaluation

Google employs sophisticated algorithms to evaluate content quality and relevance. While specifics of their methods remain proprietary, Google likely employs a blend of techniques to assess content authenticity (it's hard to say what they engage to tell for certain, because that information is proprietary - so others can't cheat and write code to trick the bots):

User Engagement Metrics: Google analyzes user behavior signals (like click-through rates, time spent on page, bounce rates) to gauge content quality. However, AI-generated content can still perform well if it aligns with user intent.
Semantic Analysis: Google's algorithms can understand the semantic meaning of content, evaluating its relevance to search queries. AI-generated text can be optimized for semantic relevance, further complicating detection.
Metadata and Authorship Signals: Google considers metadata (like authorship information, publication history) as signals of content authenticity. However, AI-generated content can be published under human-like pseudonyms.

The main purpose behind Google checking for AI content is to ensure that there is no spam being published online. Google emphasizes that utilizing AI to manipulate search engine ranking through content generation violates their spam policies (example: clickbait or keyword stuffing). They stress the importance of prioritizing user-centric content creation to evade AI detection.

While Google doesn't universally deem all AI-generated content as spam, they prioritize authenticity, relevance, and quality. If AI is employed to produce high-caliber, pertinent, and original content, it may not be flagged as spam. However, mass-producing subpar, irrelevant, plagiarized, or manipulative content to deceive users or boost rankings contravenes Google's guidelines. Their Webmaster Quality Guidelines explicitly prohibit auto-generated content or practices leading to a poor user experience, which may encompass certain types of AI-generated content. The key takeaway is the necessity of focusing on crafting meaningful and valuable content for users, regardless of whether AI is part of the creation process.

The Ethics of AI-Generated Content

The rise of AI-generated content raises ethical concerns about transparency and disclosure. While AI can aid content creation by automating repetitive tasks, its use should be accompanied by clear disclosure to readers. Transparency ensures that consumers can make informed decisions about the content they consume.

Similarly, when ChatGPT first made its appearance, there were clear ethical implications regarding cheating. For example, this 2023 ABA Journal article reports ChatGPT scoring in the 90th percentile for the bar exam. And, as a new parent, I worry about the future of AI impacting cognitive learning abilities (my mother's voice in my head saying "in my day, we used encyclopedias!" rings loud in clear).

So, can Google Actually Tell if Content is Generated

The purpose of this article was for me to find evidence that Google is indexing original content over AI. However, since I can't determine what was written by AI and what wasn't, I had to find an alternative.

In math class, if 7+X=19, we have to solve backwards to find X. Therefore, the most logical solution to me was to see if AI could determine something was AI. I went to my favorite generator, ChatGPT, and this is what I wrote:

ME: If I gave you an excerpt from a blog post, would you be able to tell me if AI wrote it?

ChatGPT: I can certainly try to help with that! If you provide me with the excerpt from the blog post, I can analyze the text to give you an indication of whether it appears to have been generated by an AI language model like me. Keep in mind that while I can offer insights based on patterns and characteristics typical of AI-generated text, my analysis is not foolproof and is based on probabilities and observations. Please go ahead and share the excerpt you have in mind.

I proceeded to share an excerpt of a blog post, with 2 paragraphs written by me, and 1 paragraph written by AI (I had this generated previously, then deleted the prompt). Here's ChatGPT's response:

ChatGPT: Based on the provided excerpt, it's difficult to definitively determine whether it was written by AI or a human. The text discusses a popular podcast episode and transitions into a broader discussion about personal branding, social media, and digital connectivity. The writing style appears coherent and reflective, addressing a contemporary topic with a logical progression of ideas.

To assess whether AI was involved in generating this text, consider the following observations: Content Structure, Language and Tone, and Depth of Analysis (ChatGPT actually listed these with details, but for the sake of brevity, I've removed that).

While AI can generate text that mimics human writing styles, the presence of specific podcast references and the natural progression of ideas could indicate human authorship. However, without more context or additional analysis, it's challenging to definitively determine the origin of the text. Ultimately, human judgment and context are crucial in discerning whether AI was involved in the creation of this content.

Since ChatGPT couldn't give me a solid answer, or even indicate that part of the post was AI, I decided to give it another test, and copy + pasted the specific AI-generated paragraph. I was shocked with the answer:

ChatGPT response to a question asked, whether a blog was written by AI — ChatGPT's response to whether a paragraph that it (ChatGPT) wrote was written by AI.

ChatGPT couldn't even distinguish something it wrote as AI! I had to do another test; I copy and pasted the same prompt I gave ChatGPT and gave it to Claude. Here's it's response:

Claude3's response to the same prompt given to ChatGPT

Due diligence would tell me that I needed to test this theory a third time, but I didn't really want to hunt down a third AI bot and make another account. I got the answer I was looking for: AI can't even tell if something is AI-written.

I'd like to point out that other individuals - probably smarter than I am - say that Google can dedect AI. Contentbot.ai says Google can detect AI-written content, by using tools like CAPTCHA to weed out bot-written content. However, they turn around to say that if you're using AI to maintain a people-first approach to content generation, Google won't penalize you (based on my research above, I think this means that Google can't tell if you're having AI write your blog posts for you).

BoostBlog concurs with ContentBot, and says, "... yes, Google can and will detect AI content if it violates their spam guidelines. However, the critical factor here is whether or not the content violates those guidelines. Google has said they will “reward high-quality content, however, it is produced.” So, if a creator uses an AI content generator and the content remains high quality, it won’t be an issue."

Conclusion

Based on the exploration into AI-generated content and Google's ability to detect it, the evidence suggests a nuanced landscape where definitive identification remains inconclusive. While initial assertions from Google imply the capability to discern AI-authored text, practical applications and experiments like those conducted with ChatGPT demonstrate inherent complexities. The interchangeability of AI and human-generated content poses intriguing questions regarding content authenticity and detection methodologies employed by tech giants like Google. Despite claims from platforms like ContentBot and BoostBlog asserting Google's detection abilities, the emphasis on content quality and user-centric creation underscores the potential leniency towards AI-assisted content that aligns with these principles.

As we navigate the evolving intersection of AI and content creation, ethical considerations surrounding transparency and consumer awareness are paramount. The dexterity of AI models in mimicking human expression necessitates vigilance in distinguishing genuine contributions from algorithmically generated content. This discourse emphasizes the need for continued dialogue and empirical studies to elucidate the capabilities and limitations of AI detection technologies within the broader context of digital content integrity. Ultimately, the quest for high-quality, user-focused content transcends the medium of creation, embodying principles that resonate with both human and AI-driven narratives in the digital age.

The TL/DR? Go ahead and use ChatGPT to write your blog posts and social media copy - Google isn't going to tell your boss!