Cloud Wars
  • Home
  • Top 10
  • CW Minute
  • CW Podcast
  • Categories
    • AI and Copilots
    • Innovation & Leadership
    • Cybersecurity
    • Data
  • Member Resources
    • Cloud Wars AI Agent
    • Digital Summits
    • Guidebooks
    • Reports
  • About Us
    • Our Story
    • Tech Analysts
    • Marketing Services
  • Summit NA
  • Dynamics Communities
  • Ask Copilot
Twitter Instagram
  • Summit NA
  • Dynamics Communities
  • AI Copilot Summit NA
  • Ask Cloud Wars
Twitter LinkedIn
Cloud Wars
  • Home
  • Top 10
  • CW Minute
  • CW Podcast
  • Categories
    • AI and CopilotsWelcome to the Acceleration Economy AI Index, a weekly segment where we cover the most important recent news in AI innovation, funding, and solutions in under 10 minutes. Our goal is to get you up to speed – the same speed AI innovation is taking place nowadays – and prepare you for that upcoming customer call, board meeting, or conversation with your colleague.
    • Innovation & Leadership
    • CybersecurityThe practice of defending computers, servers, mobile devices, electronic systems, networks, and data from malicious attacks.
    • Data
  • Member Resources
    • Cloud Wars AI Agent
    • Digital Summits
    • Guidebooks
    • Reports
  • About Us
    • Our Story
    • Tech Analysts
    • Marketing Services
    • Login / Register
Cloud Wars
    • Login / Register
Home » Incidents Highlight Intellectual Property Risks Amid Lack of Universal AI Governance
AI and Copilots

Incidents Highlight Intellectual Property Risks Amid Lack of Universal AI Governance

Kieron AllenBy Kieron AllenJuly 12, 20243 Mins Read
Facebook Twitter LinkedIn Email
Share
Facebook Twitter LinkedIn Email

Have you heard of the Robots Exclusion Protocol, commonly referred to as robots.txt? Unless you’re involved with the publishing industry or a seasoned web developer, it’s likely this somewhat non-descript standard is new to you.

Introduced during the internet boom in the mid-1990s to help stem the onslaught of web crawlers, the robots.txt protocol became a critical tool for publishers, enabling them to specify — using “allow” and “disallow” commands — which URLs on their site can be accessed by a crawler or spider.

Now, with the onset of GenAI, companies have begun using web content for model training and summarization functions. However, recent investigations have found that a number of these companies are failing to follow the robot.txt protocol and instead bypassing it to scrape data without the consent of content creators.  

Major Offender

A recent investigation by Wired focused on the practices of Perplexity, an AI search startup with notable investors including Jeff Bezos and NVIDIA. The investigation found that Perplexity was likely ignoring the robots.txt protocol and scraping sections of websites that had been designated with disallow status.

Wired found that Perplexity was carrying out this practice on its website, as well as others owned by its parent publisher Condé Nast. Beyond this, the investigation found that some of the output from Perplexity was skewed. As well as failing to attribute ownership of the content, the chatbot was also found to, at times, summarize stories inaccurately.

“In theory, Perplexity’s chatbot shouldn’t be able to summarize WIRED articles, because our engineers have blocked its crawler via our robots.txt file since earlier this year,” says Wired’s Dhruv Mehrotra in his report.

However, Wired isn’t the only publication aiming at Perplexity. Forbes has also publicly outed the company, asserting it lifted an article belonging to Forbes and posted it as its own with no credit to the Forbes journalists that scribed the original content. That said, Perplexity isn’t the only offender.

Growing Concern

Reuters recently reported on the findings of the content licensing platform TollBit. Despite not naming the companies specifically, TollBit reported that its analysis found numerous AI agents had been bypassing the robots.txt protocol.

“What this means in practical terms is that AI agents from multiple sources (not just one company) are opting to bypass the robots.txt protocol to retrieve content from sites,” TollBit told Reuters. “The more publisher logs we ingest, the more this pattern emerges.”

Governance Is Critical

The quest for ethical, trusted AI has become a cornerstone of companies using AI technologies to develop new products and services. We regularly report on the efforts of companies to establish AI governance protocols, such as Salesforce’s joint effort with NIST and Google Cloud’s Kyndryl partnership driven by responsible GenAI deployment.

While there are yet to be universal standards for AI governance in place, work is underway. There’s limited federal legislation in the US but as yet no comprehensive framework. The EU has put in place the first major law governing AI use. And of course, within the business community, there are numerous internal compliance frameworks and AI governance groups, such as the AI Alliance, launched by IBM and Meta back in 2023.

However, there’s no unified legislative approach mandating responsible AI governance, or how offenders that don’t comply are punished. As well as proprietary LLMs, Perplexity uses other off-the-shelf models from companies including OpenAI and Anthropic.

To protect the intellectual property rights of publishers, we should see laws — in the absence of ethical behavior — that makes it illegal for organizations to utilize LLMs for nefarious practices.


ai featured governance
Share. Facebook Twitter LinkedIn Email
Analystuser

Kieron Allen

Cloud, AI, Innovation
Cloud Wars analyst

Areas of Expertise
  • Business Apps
  • Cloud
  • Cybersecurity
  • Data
  • LinkedIn

Kieron Allen is a Cloud Wars Analyst examining innovations in, and the future impact of, the latest AI, cloud, cybersecurity, and data technology developments. In his ongoing analyses and video reports, Allen focuses on the platforms, applications, people, and ideas that will mold our digital future. After serving as the Online Editor for BBC Sky at Night Magazine and as the Editorial Assistant for BBC Focus Magazine, Kieron became a freelance journalist in 2015 where his focus on the business technology market became a key passion. Kieron partners with technology start-ups and organizations that share his interests in science, social affairs, non-profit work, fashion and the arts.

  Contact Kieron Allen ...

Related Posts

Google’s Vision for Gemini Super Assistant, Universal Capabilities

May 30, 2025

Accelerate, Assist, Transform: A Framework for AI Adoption Success

May 30, 2025

Google Offers First-of-Its-Kind GenAI Certification for Managers

May 30, 2025

Marc Benioff Is Transforming World’s Largest Apps Vendor into AI-Data Powerhouse

May 29, 2025
Add A Comment

Comments are closed.

Recent Posts
  • Google’s Vision for Gemini Super Assistant, Universal Capabilities
  • Accelerate, Assist, Transform: A Framework for AI Adoption Success
  • Google Offers First-of-Its-Kind GenAI Certification for Managers
  • Marc Benioff Is Transforming World’s Largest Apps Vendor into AI-Data Powerhouse
  • AI Agents Are Here: Why C-Suite Leaders Should Pay Attention Now

  • Ask Cloud Wars AI Agent
  • Tech Guidebooks
  • Industry Reports
  • Newsletters

Join Today

Most Popular Guidebooks

Accelerating GenAI Impact: From POC to Production Success

November 1, 2024

ExFlow from SignUp Software: Streamlining Dynamics 365 Finance & Operations and Business Central with AP Automation

September 10, 2024

Delivering on the Promise of Multicloud | How to Realize Multicloud’s Full Potential While Addressing Challenges

July 19, 2024

Zero Trust Network Access | A CISO Guidebook

February 1, 2024

Advertisement
Cloud Wars
Twitter LinkedIn
  • Home
  • About Us
  • Privacy Policy
  • Get In Touch
  • Marketing Services
  • Do not sell my information
© 2025 Cloud Wars.

Type above and press Enter to search. Press Esc to cancel.

  • Login
Forgot Password?
Lost your password? Please enter your username or email address. You will receive a link to create a new password via email.