DEF CON Generative AI Hacking Challenge Explored Cutting Edge of Security Vulnerabilities


Data from the human vs. machine challenge could provide a framework for government and enterprise policies around generative AI.

DEF CON Generative AI Hacking Challenge Explored Cutting Edge of Security Vulnerabilities
Image: AVC Photo Studio/Adobe Stock

OpenAI, Google, Meta and more companies put their large language models to the test on the weekend of August 12 at the DEF CON hacker conference in Las Vegas. The result is a new corpus of information shared with the White House Office of Science and Technology Policy and the Congressional AI Caucus. The Generative Red Team Challenge organized by AI Village, SeedAI and Humane Intelligence gives a clearer picture than ever before of how generative AI can be misused and what methods might need to be put in place to secure it.

Jump to:

Generative Red Team Challenge could influence AI security policy

The Generative Red Team Challenge asked hackers to force generative AI to do exactly what it isn’t supposed to do: provide personal or dangerous information. Challenges included finding credit card information and learning how to stalk someone. The AI Village team is still working on analyzing the data that came from the event and expects to present it next month.

This challenge is the largest event of its kind and one that will allow many students to get in on the ground floor of cutting-edge hacking. It could also have a direct impact on the White House’s Office of Science and Technology Policy, with office director Arati Prabhakar working on bringing an executive order to the table based on the event’s results.

Organizers expected more than 3,000 people would participate, with each taking a 50-minute slot to try to hack a large language model chosen at random from a pre-established selection. The large language models being put to the test were built by Anthropic, Cohere, Google, Hugging Face, Meta, NVIDIA, OpenAI and Stability. Scale AI developed a scoring system.

“The diverse issues with these models will not be resolved until more people know how to red team and assess them,” said Sven Cattell, the founder of AI Village, in a press release. “Bug bounties, live hacking events and other standard community engagements in security can be modified for machine learning model-based systems.”

SEE: At Black Hat 2023, a former White House cybersecurity expert and more weighed in on the pros and cons of AI for security. (TechRepublic)

The AI Village team will use the results of the challenge to make a presentation to the United Nations next month, Rumman Chowdhury, co-founder of Humane Intelligence, an AI policy and consulting firm, and one of the organizers of the AI Village, told Axios.

That presentation will be part of the trend of continuing cooperation between the industry and the government on AI safety, such as the DARPA project AI Cyber Challenge, which was announced during the Black Hat 2023 conference. It invites participants to create AI-driven tools to solve AI security problems.

What vulnerabilities are LLMs likely to have?

Before DEF CON kicked off, AI Village consultant Gavin Klondike previewed seven vulnerabilities someone trying to create a security breach through an LLM would probably find:

  • Prompt injection.
  • Modifying the LLM parameters.
  • Inputting sensitive information that winds up on a third-party site.
  • The LLM being unable to filter sensitive information.
  • Output leading to unintended code execution.
  • Server-side output feeding directly back into the LLM.
  • The LLM lacking guardrails around sensitive information.

“LLMs are unique in that we should not only consider the input from users as untrusted, but the output of LLMs as untrusted,” he pointed out in a blog post. Enterprises can use this list of vulnerabilities to watch for potential problems.

In addition, “there’s been a bit of debate around what’s considered a vulnerability and what’s considered a feature of how LLMs operate,” Klondike said.

These features might look like bugs if a security researcher were assessing a different kind of system, he said. For example, the external endpoint could be an attack vector from either direction — a user could input malicious commands or an LLM could return code that executes in an unsecured fashion. Conversations must be stored in order for the AI to refer back to previous input, which could endanger a user’s privacy.

AI hallucinations, or falsehoods, don’t count as a vulnerability, Klondike pointed out. They aren’t dangerous to the system, though AI hallucinations are factually incorrect.

How to prevent LLM vulnerabilities

Although LLMs are still being explored, research organizations and regulators are moving quickly to create safety guidelines around them.

Daniel Rohrer, NVIDIA vice president of software security, was on-site at DEF CON and noted that the participating hackers talked about the LLMs as if each brand had a distinct personality. Anthropomorphizing aside, the model an organization chooses does matter, he said in an interview with TechRepublic.

“Choosing the right model for the right task is extremely important,” he said. For example, ChatGPT potentially brings with it some of the more questionable content found on the internet; however, if you’re working on a data science project that involves analyzing questionable content, an LLM system that can look for it might be a valuable tool.

Enterprises will likely want a more tailored system that uses only relevant information. “You have to design for the point of the system and application you’re trying to achieve,” Rohrer said.

Other common suggestions for how to secure an LLM system for enterprise use include:

  • Limit an LLM’s access to sensitive data.
  • Educate users on what data the LLM gathers and where that data is stored, including whether it is used for training.
  • Treat the LLM as if it were a user, with its own authentication/authorization controls on access to proprietary information.
  • Use the software available to keep AI on task, such as NVIDIA’s NeMo Guardrails or Colang, the language used to build NeMo Guardrails.

Finally, don’t skip the basics, Rohrer said. “For many who are deploying LLM systems, there are a lot of security practices that exist today under the cloud and cloud-based security that can be immediately applied to LLMs that in some cases have been skipped in the race to get to LLM deployment. Don’t skip those steps. We all know how to do cloud. Take those fundamental precautions to insulate your LLM systems, and you’ll go a long way to meeting a number of the usual challenges.”



Source link