Cloud Wars
  • Home
  • Top 10
  • CW Minute
  • CW Podcast
  • Categories
    • AI and Copilots
    • Innovation & Leadership
    • Cybersecurity
    • Data
  • Member Resources
    • Cloud Wars AI Agent
    • Digital Summits
    • Guidebooks
    • Reports
  • About Us
    • Our Story
    • Tech Analysts
    • Marketing Services
  • Summit NA
  • Dynamics Communities
  • Ask Copilot
Twitter Instagram
  • Summit NA
  • Dynamics Communities
  • AI Copilot Summit NA
  • Ask Cloud Wars
Twitter LinkedIn
Cloud Wars
  • Home
  • Top 10
  • CW Minute
  • CW Podcast
  • Categories
    • AI and CopilotsWelcome to the Acceleration Economy AI Index, a weekly segment where we cover the most important recent news in AI innovation, funding, and solutions in under 10 minutes. Our goal is to get you up to speed – the same speed AI innovation is taking place nowadays – and prepare you for that upcoming customer call, board meeting, or conversation with your colleague.
    • Innovation & Leadership
    • CybersecurityThe practice of defending computers, servers, mobile devices, electronic systems, networks, and data from malicious attacks.
    • Data
  • Member Resources
    • Cloud Wars AI Agent
    • Digital Summits
    • Guidebooks
    • Reports
  • About Us
    • Our Story
    • Tech Analysts
    • Marketing Services
    • Login / Register
Cloud Wars
    • Login / Register
Home » Incidents Highlight Intellectual Property Risks Amid Lack of Universal AI Governance
AI and Copilots

Incidents Highlight Intellectual Property Risks Amid Lack of Universal AI Governance

Kieron AllenBy Kieron AllenJuly 12, 20243 Mins Read
Facebook Twitter LinkedIn Email
Share
Facebook Twitter LinkedIn Email

Have you heard of the Robots Exclusion Protocol, commonly referred to as robots.txt? Unless you’re involved with the publishing industry or a seasoned web developer, it’s likely this somewhat non-descript standard is new to you.

Introduced during the internet boom in the mid-1990s to help stem the onslaught of web crawlers, the robots.txt protocol became a critical tool for publishers, enabling them to specify — using “allow” and “disallow” commands — which URLs on their site can be accessed by a crawler or spider.

Now, with the onset of GenAI, companies have begun using web content for model training and summarization functions. However, recent investigations have found that a number of these companies are failing to follow the robot.txt protocol and instead bypassing it to scrape data without the consent of content creators.  

Major Offender

A recent investigation by Wired focused on the practices of Perplexity, an AI search startup with notable investors including Jeff Bezos and NVIDIA. The investigation found that Perplexity was likely ignoring the robots.txt protocol and scraping sections of websites that had been designated with disallow status.

Wired found that Perplexity was carrying out this practice on its website, as well as others owned by its parent publisher Condé Nast. Beyond this, the investigation found that some of the output from Perplexity was skewed. As well as failing to attribute ownership of the content, the chatbot was also found to, at times, summarize stories inaccurately.

“In theory, Perplexity’s chatbot shouldn’t be able to summarize WIRED articles, because our engineers have blocked its crawler via our robots.txt file since earlier this year,” says Wired’s Dhruv Mehrotra in his report.

However, Wired isn’t the only publication aiming at Perplexity. Forbes has also publicly outed the company, asserting it lifted an article belonging to Forbes and posted it as its own with no credit to the Forbes journalists that scribed the original content. That said, Perplexity isn’t the only offender.

Growing Concern

Reuters recently reported on the findings of the content licensing platform TollBit. Despite not naming the companies specifically, TollBit reported that its analysis found numerous AI agents had been bypassing the robots.txt protocol.

“What this means in practical terms is that AI agents from multiple sources (not just one company) are opting to bypass the robots.txt protocol to retrieve content from sites,” TollBit told Reuters. “The more publisher logs we ingest, the more this pattern emerges.”

Governance Is Critical

The quest for ethical, trusted AI has become a cornerstone of companies using AI technologies to develop new products and services. We regularly report on the efforts of companies to establish AI governance protocols, such as Salesforce’s joint effort with NIST and Google Cloud’s Kyndryl partnership driven by responsible GenAI deployment.

While there are yet to be universal standards for AI governance in place, work is underway. There’s limited federal legislation in the US but as yet no comprehensive framework. The EU has put in place the first major law governing AI use. And of course, within the business community, there are numerous internal compliance frameworks and AI governance groups, such as the AI Alliance, launched by IBM and Meta back in 2023.

However, there’s no unified legislative approach mandating responsible AI governance, or how offenders that don’t comply are punished. As well as proprietary LLMs, Perplexity uses other off-the-shelf models from companies including OpenAI and Anthropic.

To protect the intellectual property rights of publishers, we should see laws — in the absence of ethical behavior — that makes it illegal for organizations to utilize LLMs for nefarious practices.


ai featured governance
Share. Facebook Twitter LinkedIn Email
Analystuser

Kieron Allen

Cloud, AI, Innovation
Cloud Wars analyst

Areas of Expertise
  • Business Apps
  • Cloud
  • Cybersecurity
  • Data
  • LinkedIn

Kieron Allen is a Cloud Wars Analyst examining innovations in, and the future impact of, the latest AI, cloud, cybersecurity, and data technology developments. In his ongoing analyses and video reports, Allen focuses on the platforms, applications, people, and ideas that will mold our digital future. After serving as the Online Editor for BBC Sky at Night Magazine and as the Editorial Assistant for BBC Focus Magazine, Kieron became a freelance journalist in 2015 where his focus on the business technology market became a key passion. Kieron partners with technology start-ups and organizations that share his interests in science, social affairs, non-profit work, fashion and the arts.

  Contact Kieron Allen ...

Related Posts

SAP Remains Hottest Enterprise-Apps Vendor by Far; Workday #2, Oracle #3, Salesforce #4

October 27, 2025

SAP Still Hottest Apps Vendor on 22.8% Cloud-Rev. Growth in Q3

October 27, 2025

Community Summit Spotlights Practical AI for Business Users

October 24, 2025

Community Summit NA: Microsoft Empowering Customers at the Frontier of AI Transformation

October 24, 2025
Add A Comment

Comments are closed.

Recent Posts
  • SAP Remains Hottest Enterprise-Apps Vendor by Far; Workday #2, Oracle #3, Salesforce #4
  • SAP Still Hottest Apps Vendor on 22.8% Cloud-Rev. Growth in Q3
  • Community Summit Spotlights Practical AI for Business Users
  • Community Summit NA: Microsoft Empowering Customers at the Frontier of AI Transformation
  • Salesforce Disrupts ITSM With Conversational Agentforce Platform

  • Ask Cloud Wars AI Agent
  • Tech Guidebooks
  • Industry Reports
  • Newsletters

Join Today

Most Popular Guidebooks and Reports

The Agentic Enterprise: How Microsoft and Industry Leaders Are Redefining Work Through AI

September 2, 2025

SAP Business Network: A B2B Trading Partner Platform for Resilient Supply Chains

July 10, 2025

Using Agents and Copilots In M365 Modern Work

March 11, 2025

AI Data Readiness and Modernization: Tech and Organizational Strategies to Optimize Data For AI Use Cases

February 21, 2025

Advertisement
Cloud Wars
Twitter LinkedIn
  • Home
  • About Us
  • Privacy Policy
  • Get In Touch
  • Marketing Services
  • Do not sell my information
© 2025 Cloud Wars.

Type above and press Enter to search. Press Esc to cancel.

  • Login
Forgot Password?
Lost your password? Please enter your username or email address. You will receive a link to create a new password via email.
body::-webkit-scrollbar { width: 7px; } body::-webkit-scrollbar-track { border-radius: 10px; background: #f0f0f0; } body::-webkit-scrollbar-thumb { border-radius: 50px; background: #dfdbdb }