Name
Navigating AI Misuse Prevention Challenges
Description

This talk is part of a two-presentation session running from 2:50 PM - 3:40 PM. The session will feature two presentations back-to-back, with Q&A after each presentation.

The World Economic Forum's Global Risks Report 2024 identifies mis- and disinformation as a critical risk, fueled by AI's expected 90% online dominance by 2026. AI breakthroughs enable the creation of harmful content at unprecedented scale, threatening trust and safety. AI can also combat these threats. This talk explores this complex landscape, examining abusers' tactics and practical approaches to combat AI misuse.

We'll examine common abuser techniques and provide an overview of protections including model safety tuning, content watermarking, safety classifiers, adversarial testing, and human reviews. Each method has its trade-offs and complexities and we'll share practical experiences and challenges in layering AI protections for a holistic defense-in-depth strategy.

This session for T&S professionals aims to increase awareness about AI-enabled threats and shares insights into abusers’ tactics. We’ll also provide practical lessons learned from building a defense-in-depth strategy to ensure AI products' safety and mitigate harmful risks.

Location Name
Seacliff CD
Date
Tuesday, July 23, 2024
Time
2:50 PM - 3:40 PM
Session Type
Presentation
Track
Engineering
Session Themes
Emerging Trends, Engineering, Investigations & Intelligence
Audience
No press
Will this session be recorded?
No