Similarity Detection at Scale for Content Enforcement

There are nearly 3 Billion monthly active users on Facebook, and each month there are 10s of millions of pieces of content uploaded that violate Meta's community standards. This talk will provide an overview of how Meta manages to remove these millions of pictures, videos and text at scale through similarity, leveraging a catalog that counts in the trillions. Content moderation is a problem that affects every service that hosts user uploaded media. From the creation of virtual avatars to a personal collection of pictures, the platform holds the responsibility of removing the violating content. The problem can be tackled in a variety of ways with classifiers, human moderators and by comparing media signatures; this presentation will be about the latter. From high severity content like CSAM and Terrorism to low-severity (but frustrating for users) content like spam, Similarity Detection is an approach to detect these violations through the comparison of definitions, or hashes, against known violating content. There are a number of complexities that can arise through this approach, but simply, this talk will aim to answer one question: how does one measure the similarity of content from the perspective of a machine and not a human eye? We will look at a variety of scenarios, such as taking into account the differences in length and size, as well as attempts from adversarial actors to get round these solutions. In addition, we will look at what open source solutions Meta provides to enable you to get started with this Similarity based approach to enforce community standards.

Location Name
Seacliff D
Tuesday, July 11, 2023
11:10 AM - 12:00 PM
Session Type
Session Themes
All TrustCon attendees
Will this session be recorded?