• Mon. Jul 1st, 2024

Amazon Web Services probes if Perplexity employs ‘web scraping’ for AI training

By

Jul 1, 2024

Amazon Web Services (AWS) is currently conducting an investigation into Perplexity, a company that uses its servers, to determine if they are using ‘web scraping’ techniques to train their Artificial Intelligence (AI) models. Web scraping is the process of collecting content from web pages using software that extracts HTML code and filters information for storage, similar to an automatic copy and paste method.

Recent reports by developer Robb Knight and Wired revealed that Perplexity has violated the Robots Exclusion Protocol on certain websites and used web scraping to train their AI models. The Robots Exclusion Protocol involves placing a robots.txt file on a domain to indicate which pages should not be accessed by robots or automated crawlers.

AWS has launched this investigation in response to these allegations to ensure that Perplexity is not violating any rules while using their services to train AI. Perplexity has stated that they respect robots.txt and their services do not violate AWS’s terms of service, except in rare cases where the bot ignores the file to retrieve specific information as requested by the user.

Wired has confirmed that their investigation aligns with Perplexity’s explanation and that the company’s chatbot does ignore robots.txt in certain cases to collect unauthorized information. AWS requires its customers to comply with their terms of service and applicable laws, and they will take appropriate action if any violations are found during the investigation.

By

Leave a Reply