A 2022 Pew Research Survey found that 95% of teenagers (aged 13-17) use YouTube and 67% use TikTok, with nearly one in three reporting near-constant use. The amount of screen time has also increased in recent years and it hovers around five and a half hours on average.
With a greater number of underage users and increasing opportunities to create and share content, comes a greater risk of exposure to illegal and harmful content online. The EU’s landmark legislation, the Digital Services Act (DSA), responds to these challenges around child protection and sets out a number of obligations which aim to keep children safe online.
The obligations addressing child protection in the DSA are spread throughout the text. On the basic level, any service provider which is directed at or used by minors and children has to make their terms of service understandable to minors. The most impacted, however, are likely to be online platforms. For example, social media, video sharing services, and many online gaming platforms, need to take measures to ensure a high level of privacy, safety, and security of minors when considering the design of their platforms.
The broad nature of the new obligation is challenging as it gives little information or detail on what exact measures will achieve compliance and what falls short. Diving into the DSA, there are hints of what compliance could mean — for example, services should ensure that minors can easily access mechanisms referenced in the DSA such as notice and action and complaint mechanisms. They should also take measures to protect minors from content that may impair their physical, mental or moral development and provide tools that enable conditional access to such information.
There is no obligation on the Commission to publish guidance on how platforms should safeguard their younger user base before the overall compliance deadline in February 2024. However, we can expect some co-regulatory measures to be in development as part of the Better Internet for Kids+ strategy. In the meantime, companies must seek out and apply existing best practices and develop their own measures in order to comply.
Future best practices on keeping children safe online will likely be developed in the risk assessment cycles of very large online platforms as well. Platforms with more than 45 million monthly active users will have to assess systemic risks related to minors and children such as risks of them being exposed to content which may harm their physical or mental health, or promote addictive behavior.
If you are an online platform, you are likely already working hard to ensure children are protected on your platform. However, whether your existing measures are enough to comply with the new obligations in the DSA needs careful assessment and benchmarking against best practices.
The US “may be about to change the law on this massively complex question about human rights on the Internet through the backdoor”, tweeted Daphne Keller, Platform Regulation Director at the Stanford Cyber Policy Centre, in a thread detailing the Gonzalez and Taamneh cases that will be appearing at the Supreme Court this week. While the aforementioned cases raise questions on platform liability with regards to content they leave up on the platform, recently passed laws in Texas and Florida – which will also be tested in the Supreme Court – limit content platforms can take down.
These four cases are at the heart of the catch 22 situation online platforms find themselves in: on the one hand there is pressure to remove content to protect user safety, and on the other, to leave content up to protect freedom of speech. At the core of this debate is whether online platforms can be held liable for the speech they host, and its outcome has the potential to completely transform the future of the tech industry.
Section 230 of the Communications Decency Act (1996) – 26 words that set the stage for the internet as we know it today – shields online platforms from liability for content posted by their users. More than two decades after its publication, it remains hotly debated with some arguing it provides too much protection for online platforms, while others state that this section is crucial to maintain freedom and diversity on the internet. Despite many attempts, there has been limited success in Congress to introduce substantive changes to the law. The Supreme Court is therefore in particularly challenging territory – they have to rule on an issue where law makers have not been able to agree on for decades.
The Gonzalez v. Google LLC case involves a dispute between the family of a victim of the Paris terror attacks from 2015, and Google, over YouTube’s recommendations of terrorist content. Similarly, Twitter Inc. v. Taamneh follows the 2017 terrorist attack in an Istanbul nightclub, where the relatives of the victim have accused Twitter, Facebook, and Google for aiding and abetting the attack by enabling the dissemination of terrorist content. As both these cases consider whether the platform can be held responsible for content it hosts, they open Section 230 to potential modifications.
Defending the current liability protection, Google has argued that Section 230 promotes free expression online and empowers websites to create their own moderation rules to make the internet a safer place. While this law has so far protected platforms when it comes to content their users post, the primary question in this case is whether Section 230 also protects the platforms’ recommendation algorithms – a feature that is crucial to many platforms’ architectures today, and for some, like Tiktok, the recommendation is the service.
On the other hand, in the Taamneh hearing, the courts will set aside Section 230 to discuss whether a platform can be charged with aiding and abetting terrorism if the service was not directly employed for the attack. In a previous hearing, the 9th Circuit ruled that indeed they can be held responsible; however, as the court did not consider Section 230, the platforms remained protected under it. Depending on whether the Supreme Court weakens the general liability protection with the Gonzalez case, it could create a significant problem for platforms as they could all be held liable for aiding and abetting terrorism.
Both states have recently tried to pass laws that make it illegal for online platforms to moderate content or restrict users in many cases. For both laws, petitions are pending in front of the Supreme Court, that has decided not to take them up this year. These laws add to the tensions around regulation in the online space and the potential rulings of the Gonzalez and Taamneh cases. While the latter two urge platforms to do more to moderate certain content on their services – to the extent of holding them liable for promoting and/or hosting such content – the state laws argue that content should not be moderated under provisions of free speech.
Notably, in the case of the Texas law, House Bill 20 forbids large social media platforms from moderation based on the “viewpoint of the speaker” – in this case, ‘lawful but awful’ content would be required to stay up as long as it is not illegal. In a panel organised by the Stanford Cyber Policy Centre on February 17th, speakers highlighted that this could pose specific risks to children. For example, content promoting eating disorders and self-harm would be required to stay up, if content discouraging the same was also up, as both could be drawn to speaker viewpoints.
These contradictory laws and decisions promise to transform content moderation on online platforms as it exists today. At its core, while the state laws mandate that platforms do not remove certain content and users, the Supreme Court cases could change Section 230 and make platforms liable for the content they recommend or fail to remove. This conflict could seemingly be resolved with the upcoming hearings, or alternatively, open up a Pandora’s box of tech regulation problems. Ultimately, the decisions in the upcoming days will impact not just the online ecosystem, but also the principles that govern it.
Whatever the decision of the hearings may be, one thing is certain – it has the potential to impact all online platforms and their content moderation processes. Would you like to know more about how these rulings may impact your business? Reach out to our tech experts on info@tremau.com.
Content moderation has become increasingly important for online platforms to protect their users from potential abuses. The evolving regulatory landscape has also put growing responsibilities on the way user-generated content should be moderated online. Notably, the upcoming Digital Services Act (DSA), which affects almost every online service provider active in the EU, will bring unprecedented obligations to online services in a wide range of sectors, as well as considerable penalties for those who fail to meet the new requirements (up to 6% of annual global turnover).
Similar regulations are under development in multiple jurisdictions around the world (Australia, Canada, UK, and South Korea – to name a few). Thus, designing and implementing a strategy for content moderation is vital not only for contributing to online trust & safety and ensuring the retention and satisfaction of the platforms’ users, but also for a company’s ability to do business in the markets where regulations are being developed. A company’s success will largely be determined by the degree to which it has managed to ingrain the new content moderation requirements into its business model.
To understand the challenges in achieving efficient and effective content moderation, Tremau interviewed content moderators and managers working in the Trust & Safety departments across more than 30 companies, ranging from mega platforms to early-stage start-ups. Notwithstanding the different types of content that moderators are exposed to given the diversity of online platforms, we have identified a set of common important practices adopted by companies and clear areas for improvement. Three major sections identified include: detection of harmful or illegal content, moderation process and controls, and crisis management.
A major challenge in content moderation is the tremendous volume of content produced in real-time. In order to accurately identify the very small proportion of potentially problematic content from the rest, companies often use a mixture of re-active moderation (answering user reports) and pro-active moderation (automated detection tools). Based on pre-determined rules or machine learning detection models, AI-empowered automated detection usually selects content that is potentially illegal, such as terrorist content or counterfeit products, or content that clearly violates a company’s terms of service. Many companies also employ automated tools as a preliminary filter, and based on the confidence threshold in the detection, a human moderator is introduced in the process to verify results.
Despite the improved efficiency brought by automated detection, the overwhelming majority of our interviewees have pointed out that the room for improvement is still large. One frequently mentioned drawback is the difficulty in treating nuanced cases, which makes a human moderator’s job indispensable. Moreover, no AI tool can be a perfect substitute for human intervention in this job given the continuously evolving and highly diverse culture and requirements. Thus, automated content moderation tools should not be built upon the principle of replacing human moderators, but of working with them.
A common issue with content moderation systems is that companies typically have to continuously fill the gap between their existing workflows and the evolving regulatory obligations – often by frequently “patching” their moderation systems. Thus, a much-needed capability is to build content moderator-centric systems according to the company’s evolving regulatory obligations, allowing better coordination among different teams and a more effective and efficient moderation strategy.
Violations of content policies are often categorized into pre-defined groups such as violence, foul language, and extremis. However, moderators can often find themselves reviewing much more nuanced, complex or context-sensitive cases. A key practice adopted by various companies is to establish multi-level moderation teams & processes. In this structure, frontline moderators are responsible for making a “Yes or No” decision for the most clear-cut cases, and send more complicated cases to higher level moderators who have more experience as well as access to more information. In rare situations of very difficult cases, senior Trust & Safety managers or other departments concerned discuss and make the final solution.
Another practice to support moderation decision-making for frontline workers is to use a decision tree during the moderation process, a practice that has been widely adopted by customer support departments and other call centers. By decomposing a complex moderation question into an array of smaller and easier options, a decision tree allows moderators to judge cases in a more structured and standardized manner, which can boost the efficiency and quality of the overall process.
Accuracy and consistency of content moderation are also key concerns. Companies develop both ex-ante and ex-post control measures to improve the quality of content moderation. Intensive training before starting as a moderator is commonly seen across companies, and regular training sessions also take place in many companies to keep moderators tuned in with the latest regulatory or terms of service updates.
Considering the constantly evolving regulations, at both national and international levels, companies often draft extensive and detailed guidelines for moderators to refer to before reaching a decision. Reviewing the accuracy of past moderation decisions on a regular basis is also widely adopted by companies. Often a random sample of the total cases treated by a moderator in any given period will be pulled from stored data and sent for examination, or some cases may be given to multiple moderators to examine their consistency; the calculated accuracy rate is often a key component of the moderators’ KPI.
Another key challenge during the moderation process is that content moderators’ tasks involve much more than simply judging whether a post should be removed or not. For example, crisis management is also part of their job when, for example, they encounter urgent cases, such as a livestream of self-harm or terrorist attack such as the livestreaming of Buffalo shooting. Such cases demand immediate outreach to law enforcement departments or other appropriate local authorities and should be considered as the digital “first aid” of our time.
Content moderators also need to provide some degree of customer support, as users may file complaints against certain moderation decisions – hence moderators must also be enabled to easily retrieve all relevant information of past cases or of users to better communicate with them.
Although content moderation is essential for almost every online platform that hosts regular interactions among users, most companies usually do not have enough resources to build or, often more challenging, to maintain and keep up-to-date, efficient and effective internal moderation systems. On this note, Tremau’s conversations with content moderators enabled us to identify a number of recommendations to create an efficient and consistent content moderation processes.
For example, given the multi-faceted nature of content moderation, the most efficient approach to enhancing content moderation processes is to integrate related functions and controls into a more moderator-centric centralized system, which enables the moderators to avoid constantly shifting between tools, ensuring a smoother workflow, important efficiency gains and more accurate KPIs and quality control.
A centralized system also allows data to be reconciled in a unified platform, thereby giving moderators the complete context needed to make decisions and enabling automated transparency reporting. It also facilitates a risk-based approach via prioritization, which allows moderators to treat cases more effectively and enables the implementation of convenient contact channels with authorities and other stakeholders in case of emergencies. Such rapid reaction mechanisms are still not mature enough in many companies.
With access to more efficient processes as well as analytics, it then becomes possible to also better protect moderators’ wellness against traumatizing content.
To meet the challenges of protecting their users & complying with regulations that are continuously evolving, a number of online platforms will need to enhance their content moderation processes and controls. The measures discussed above streamline the moderation processes to be more efficient, and – with appropriate structuring of data – can automate transparency reporting, which is increasingly in demand across voluntary codes and regulations.
With regulations such as the Terrorist Content Online Regulation, which sets a 1-hour limit for online services to remove Terrorist and Violent Extremist Content (TVEC) from their platforms, there also needs to be further investments into reliable mechanisms to prioritize content in moderation queues. Thus, “Compliance by Design” will become a necessary focus for building effective and future-proof content moderation systems. Successfully building these capabilities will soon become a key differentiator, and even a critical factor, for survival.
Tremau’s solution provides a single trust & safety content moderation platform that prioritizes compliance as a service and integrates workflow automation and other AI tools. The platform ensures that providers of online services can respect all DSA requirements while improving their key trust & safety performance metrics, protecting their brands, increasing handling capacity, as well as reducing their administrative and reporting burden.
We would like to thank all the content moderators & managers who took the time to talk to us and contributed to our findings.
Tremau Policy Research Team
The growing regulatory spotlight on content moderation, shorter deadlines for content removal, growth of detection of potentially illegal or harmful content to be reviewed, and pressing needs to protect both the safety and freedom of expression of users, has increased the urgency to enhance existing online moderation practices. With these practices becoming widespread, it is important to ensure that this process is effective, efficient, of high quality, and that it keeps the best interests of all stakeholders at heart.
To achieve this, let us look at three key points in the process that can be optimized going forward:
Receiving continuous alerts from users can be overwhelming for human moderators, especially over extended periods of time. At this junction, it is crucial to prioritize and manage alerts – rather than follow, for example, a “first-in-first-out” or other sub-optimal approach. A solution for this is to ensure that user reports are labeled according to the level of harm they could cause (following a risk-based approach) and based on statistical analysis of the available metadata. This is important for user safety – especially in cases of emergency – as it allows cases that are time sensitive to be dealt with quickly. It can also be beneficial for moderator safety as they are warned that they will be viewing more harmful or less harmful content. A lesser considered point when discussing management of user reports is the moderators’ experience of the process itself. An optimized moderator screen can save decision making time and increase overall process efficiency by more than 20%.
Another pain point in content moderation is managing the process across a variety of platforms, people, and teams. As regulations demand increasing responsiveness and complaint handling from online services, it is important to ensure that you have the right mechanisms in place for end-to-end moderation and complaint handling that is helping build user trust and protect your brand. For instance, a moderation case cannot close immediately once it has been handled after a very first notice. This is because, under the Digital Services Act (DSA), a user can still contest – for at least 6 months – the handling of the case and even take the complaint to an out-of-court dispute settler. Content moderation teams will thus need to account for the possibility of the case continuing beyond the initial handling. This includes making sure that the complaints are uniquely identifiable to streamline this process and that all relevant information is easily available to ensure process quality.
The third point to consider is growing transparency reporting requirements. Over recent years, calls for transparency reports from online services have come from civil society and governments alike. This has led to a variety of different frameworks from private actors in the ecosystem and resulted in transparency reporting becoming a key part of digital legislation, as seen in the DSA. Transparency is critical to ensure the safe and fair moderation of online platforms. To produce comprehensive transparency reports, it is crucial to keep a clear and consistent account of all requests for removal or restriction of content. To do this, the tools used by the moderators need to be effective at managing large volumes of notices as well as streamlining storage and labelling of data.
Optimizing your content moderation processes will allow you to be more efficient with your costs as well as more effective in protecting your users, moderators, and brand. To achieve this, it is important to introduce new processes, incorporate automation and intelligence to improve speed and quality, and build moderator-centric tools. More importantly, it is critical to prioritize quality assurance to ensure that the right balance between safety and freedom of expression online is met.
With regards to regulation, the DSA states that following a user report the company is liable for the effect given to it. Poor content moderation can raise reputation, regulatory, and other business risks which can also lead to loss of users and market share, as well as significant fines (up to 6% of global annual turnover). Thus, adopting a content moderation system that meets technical compliance requirements from the get-go, as well as prioritizes human safety and quality, is crucial.
The Tremau tool is a single end-to-end content moderation platform designed to help you streamline your processes, automating them whenever possible, managing and prioritizing different reported content (whatever the source of detections), as well as continuously producing audit trails for transparency reporting – enabling you to cut costs and collaborate more effectively. The end-to-end process on a single platform allows all team members to see the progression of cases and ensure better communication, faster treatment, higher consistency and quality, and fewer bottlenecks in internal handling – while ensuring the privacy of its users.
The tool is also created to ensure smooth experiences for moderators. This is done through limiting the number of clicks and screen changes as well as including API connections to external stakeholders to ensure rapid contact. Finally, the tool collects and analyzes data throughout the end-to-end moderation process to ensure that nothing falls through the cracks and absolute transparency can be maintained. Such improvements enable platforms to increase reaction times towards removing or restricting content, thus ultimately protecting users and society. Moreover, it keeps the well-being and retention of moderators at their core by taking steps towards ensuring that their exposure to harmful content is limited and their tasks are streamlined.
To learn more about how Tremau can help you, contact us at info@tremau.com.
Tremau Policy Research Team
Online dating platforms have exploded in popularity over the past decade with their combined global user bases topping 323 million and earning the industry $5.61 billion in 2021. However, the exponential growth of internet users has led to several enduring problems with creating an accessible virtual dating space where everyone feels safe and included. With a projected 15% increase in the industry’s usership by 2027, it is becoming a critical business priority of these platforms to invest in solutions in content moderation and user security, trust and well-being.
Online harassment remains a persistent problem on social media platforms, and dating sites are no exception. Women in particular face frequent instances of virtual stalking, aggression, and threats of violence, as well as unsolicited explicit images – a phenomenon particularly unique to dating apps. Around 52% of women aged 18-35 reported having been sent unsolicited explicit images from new matches, and another 20% reported having been subjected to threats of physical violence.
Even more concerning is research published in 2019 that found that no free-to-use dating platform screens their users for prior sexual offences, allowing predators to use the platform anonymously. Due to a lack of effective moderation, people have to decide whether being subjected to harassment is a price worth paying in order to participate or remain on these platforms.
Racial prejudice also remains an issue for many individuals online, despite the rise of more inclusive and accessible dating sites. A 2018 study done by OkCupid found that Black women and Asian men were the least likely groups to receive messages or responses, while both white men and women tend to be reluctant to date other ethnicities. This problem is exacerbated within the gay community, where dating apps have identified pervasive issues with racial discrimination.
Another hurdle for online platforms is the question of privacy and personal data. To keep their services free, many websites and social media companies sell their users’ data to third-parties for targeted advertisements. The extent of this was not well understood until, in 2019, the Norwegian Consumer Council discovered that the many popular dating apps collect and sell information such as the user’s exact location, sexual orientation, religious and political beliefs, and even drug use, and medical conditions. This set off alarm bells for consumers and regulators alike who began investigating ways to curtail what information companies could freely transmit to outsiders.
Companies have been working on how to solve these issues internally. Tinder, for example, in 2020 rolled out new features aimed at ensuring user safety when meeting matches for the first time, including an emergency responder-activated “Panic Button”, in-app safety check-ins during a date, and real-time photo verification to prevent catfishing (impersonating someone else online). Bumble made headlines this year when it released the Private Detector, an open-source A.I. software that detects and automatically blurs explicit images sent within the app. Other apps opted to remove the ability for users to sort profiles based on race, however the efficacy of this action is still debated.
As consumers demand more accountability from companies to make online dating a more inclusive and secure space, national governments are taking note and passing legislation to rein in these actors.
The UK has published a draft Online Safety Bill which includes a wave of regulations for social media platforms, including making companies liable to respond to reports of abuse or harassment. The law will also make “cyberflashing” – sending unsolicited explicit images – a criminal offence. In fact, lobbies for cyberflashing laws by companies like Bumble have successfully pushed through similar bills in Texas, Virginia, and most recently California.
Similarly, in Europe, the Digital Services Act (DSA), which will be live from mid-November, aims to better protect users, establish clear frameworks of accountability for platforms, and foster competition. As long as a dating site has users in an EU Member State, they will face a bulk of the obligations the regulation mandates. See what exactly the DSA means for your business here.
Judging by the trend of recent regulations, it is certain that governments around the world will continue to focus on user-oriented regulations of online companies, so it is imperative that dating apps move quickly to keep up. Not complying with the DSA may result in fines of up to 6% of the platform’s global annual turnover, or even the termination of the platform’s services in the EU.
The EU alone represents a large portion of these platforms’ user base, meaning providers will need to ensure they make several immediate operational changes in order to meet new rules and avoid hefty penalties.
Firstly, dating platforms will need to declare a single point of contact in the EU that can be held legally accountable for infractions of the DSA. Dating service providers will then need to ensure they have implemented a well-designed, transparent, content moderation system that provides the tools for users and the platform alike to adequately respond to law enforcement, trusted flaggers, and out-of-court dispute requests.
Another major hurdle for companies will be a range of stipulations as to the design of the platform itself. Indeed, the new due diligence obligations for very large online platforms (VLOPs) will impact the way dating sites allow user interaction, share content, show advertisements, and more. The DSA also places a priority on protection of minors, emphasising preventative risk assessments that, in the case of dating sites, would include clearly laying out the company’s procedure to ensure age verification prevents minors from using the service.
In short, all online platforms and service providers will be required to adopt a robust streamlined approach to content moderation and user safety that is guaranteed through continuous compliance and transparency reporting.
Time is short for companies to get their houses in order in the face of the recently adopted DSA. To help your platforms, Tremau offers a comprehensive single trust & safety content moderation platform that prioritises compliance as a service by integrating workflow automation amongst other AI tools. Tremau’s platform ensures that e-dating providers and other VLOPs (very large online platforms) are up to standard for the DSA requirements while also improving their key trust & safety performance metrics. This way brands can have the peace of mind of protecting their users and of being protected themselves, and also increase their handling capacity, while reducing the growing administrative and reporting burden of content moderation.
For further information on these regulations and how they can affect your business, please contact info@tremau.com.
Tremau Policy Research Team
Online platforms largely rely on content moderation to remove illegal or harmful content, such as terrorist or child abuse images and videos. Often illegal content that has been detected and removed re-appears, possibly multiple times, as manipulated copies – for example, images with added watermarks, cropped, rotated, edited, etc. Reappearing illegal content can be a significant percentage of illegal content online for many platforms. Hence, automatically and accurately re-detecting manipulated copies of previously removed content can have a significant impact on the quality of a company’s content moderation and on online Trust and Safety. In this blog we discuss the problem of detecting manipulated copies of images, outlining a generic approach while also reviewing some methods used in computer vision.
Image copy detection involves determining whether a target image is a copy or an altered version of another image in a dataset (the reference images, possibly a large collection of images, for example, those that have been previously detected and removed from a platform). The goal is to identify whether two images originate from the same source or not. Note that this is different from other related computer vision problems, such as instance-level recognition (images of the exact same object taken under different conditions, e.g. from different angles), or category-level recognition—images containing different versions of the same concept (e.g. a red car and a blue car), or image similarity detection where the problem is to detect whether images are different photos of the same instance but for example from different angles or taken under different conditions (Figure 1).
A general pipeline to develop image copy detection methods is shown in Figure 2. All the methods we discuss here only differ in terms of the deployed model to extract features. Below, we explain each of the four steps and review some relevant techniques.
Online platforms, such as social media, which may receive millions of images each day, have an incentive to automatically detect harmful content that is reposted on their platform. For example, the Facebook AI team hosted a competition to encourage other research teams to tackle this problem, and developed a dataset [3] of transformed images (Figure 3) that are representative of the augmentations that are common on the internet. We use this data below.
Digital images are made up of pixels. If all the pixel values of a picture, such as the cat in Figure 4, are “flattened out” into a single sequence, we can represent the image as a high-dimensional vector with elements being the pixel values (as indicated in the image above). We can do the same with an entire dataset of reference images as well as any query one. However, using this pixel representation may not facilitate the comparison of similar (e.g., copied and manipulated) images. For example, if one rotates an image (even a little), the pixel vector of the rotated image will be very different from that of the original one. It is therefore practically useful to have a vector (or matrix) representation of images that captures some visual relationship between images spatially—so that “similar” images are closer together, where “similar” in our case means “produced from the same original image/source”. Such an arrangement, called an embedding, allows us to translate a problem that humans solve visually into a mathematical problem that can be solved, for example, by computing distances between vectors (which are the representations of the images).
We, therefore, need a method of producing useful embeddings, that is to say, a way (i.e, a function) of mapping raw-pixel vectors to vectors so that similar – according to the copy-detection definition of similarity in our case – images are close. In this project, we explore two well-known approaches to do this (See Figure 5). One in which images are described in terms of a set of small, characteristic local visual features and another that uses convolutional neural networks (CNN) to learn an embedding:
These models consist of two main steps: Local descriptor extraction and encoding.
Local features, or local descriptors, refer to patterns or distinct structures found in an image, such as corners, edges, or small image patches. They are usually associated with an image patch that differs from its immediate surroundings by texture, color, or intensity. What a feature represents does not matter, just that it is distinct from its surroundings.
Scale Invariant Feature Transform (SIFT) is an algorithm that works by detecting the most significant regions (i.e. key points) in an image and describing them using vectors called key points descriptors. The descriptors are invariant to a variety of visual transformations (e.g. translation, scaling, and rotation), so they can be used to identify images that are different perspectives of the same object or are certain manipulations (e.g., rotations or translations) of the same image. SIFT describes each keypoint using a descriptor (a vector) that captures the location, scale, and orientation of the keypoint, leading to a representation of each image keypoint using 128 numbers (so each keypoint is a 128-dimensional vector). To achieve this, SIFT is mainly composed of the following stages [7]:
Each of these stages consists of a number of steps. Some useful tutorials about how these stages work can be found in these videos. For example, how does one even identify “interesting key points” in an image? This is what the scale invariant key points identification stage does (details about how key points can be identified can be found here). Briefly, the method can be summarized as follows. A useful image keypoint [13] is a region that is visually distinct from its surroundings but relatively consistent within itself. Such a region, a “blob”, is distinguishable because it’s typically bounded by edges, i.e. sharp changes in brightness or color – see Figure 7.
A method to find these key points is to apply the second derivative of the gaussian function, the Laplacian of Gaussian (LoG) (it looks like an upside-down Mexican hat), as a filter to the image. The output created by filtering an image with a LoG will have local extrema wherever the image has a blob – and generally some changes (e.g., color, texture, etc). Additionally, we apply the LoG filter to the image using different scales of the filter because some regions that don’t look like blobs at one scale can be correctly identified at another (hence the scale invariance characteristic of SIFT features). We can therefore find all the blobs in an image by determining the local extrema of the image after applying the LoG filter at multiple scales/image resolutions. Achieving scale invariance is an important capability of SIFT type methods. After a collection of potential key points has been identified, a few checks are applied to eliminate unnecessary outliers. Orientation-related information about each key point is then found during the orientation assignment stage. This information is then used to create eventually a descriptor (vector) for each keypoint of size 128 that is scale, translation, and rotation invariant (see how this is done in this video).
One drawback of the use of key points is that a single image is represented by a large number (typically thousands) of them. If we want to perform image matching using SIFT-generated features, all of these key points must be individually compared between a query image and all images in a dataset. Instead, the search process can be optimized by defining only a few, say k, “prototypical” key points (we call this set of key points a “codebook” and sometimes call the “prototypical” key points visual words) that have been identified across all images in our database, and then describe each of the database images as well as the query ones using only these few visual words [10][12]. Think of the codebook as a common library of visual words that are prominent across all images in the database, such as patterns like edges and corners. For example, let’s say we start from a total of 9000 visual words across all images in our database (each being a vector of size 128 if we use SIFT), 800 of them representing horizontal edges, 600 representing vertical edges, etc. While building the codebook, the clustering algorithm can put all these horizontal edges in one cluster, and then the most typical key point vector of this class will be considered as its center (e.g., the most typical horizontal edge, etc.) which will be used for the codebook.
A codebook can be created using, for example, unsupervised learning techniques such as k-means clustering, where we cluster all key points found across all images in the database. The size k (number of clusters) of the codebook C = {c1, … , ck} can be determined empirically (for example in our case we considered values ranging from k=16 to k=256). The k elements {c1, … , ck} of the codebook are the centers of the clusters found. More information can be found in this video.
Once a codebook C is created, one can then represent each image using this codebook. One method to do so, that we use here, is VLAD (vector of locally aggregated descriptors) [10][12]. With this method, we represent each image using a vector U=[u1T,u2T,…,ukT] , where ui= NN(xj)=ci xj -ci . Here j represents the j-th key point (say, found using SIFT) of the image whose closet element in the codebook is ci (i=1,…,k), namely ci = NN( xj) where NN stands for nearest neighbor. Essentially, given the key points of an image we: first assign each key point of the image to the visual word i of the codebook that is the closest to that keypoint; then take the distance of the keypoint from that visual word, and sum up these distances for all key points “assigned” to that visual word i. This is what ui (for visual word i) is. Note that all ui (i=1,…,k) have the same size which is equal to the size of the initial local features/key points used (e.g., 128 dimensional if we use SIFT). Therefore, the dimensionality of the whole embedding U describing any image is fixed – and in the case of SIFT it is (128 x k).
In recent years, the popularity of SIFT-based models has been often overtaken by CNN models, which have been shown to outperform local feature detectors in many vision tasks. While there are many CNN models, we used embeddings from two main groups of models, one that is based on supervised learning (e.g., image classification) and one on self-supervised learning:
One of the state-of-art Self-Supervised Contrastive Learning models for image copy detection—Self-Supervised Descriptor for Image Copy Detection (SSCD)—was developed by the Facebook AI team to solve the problem specified in their 2021 Image Similarity Competition (Figure 8). While the method has multiple components (we do not use them all in the experiments below), we focus on two key ones: contrastive learning using a method called SimCLR, and entropy regularization. We describe these briefly next.
In Contrastive Learning we train a neural network with pairs of data (e.g., images) that may or may not be instances of the same object (e.g., source image). Two data points that are instances of the same object are called a positive pair, else they are a negative pair. In the embedding space produced by the network, we would like the vectors of positive pairs to be close together while vectors of negative pairs remain far apart (as in Figure 9).
The loss function of a neural network is partly determining how the network is trained as well as the embeddings one can generate from it. Therefore, selecting a loss function that achieves the goal above is key. A contrastive loss function that achieves the separation we’re looking for is
averaged across all positive pairs (i,j), where si,j and si,k are the cosine similarity of positive and negative pairs that include image i, respectively. This loss is minimized when the similarity of a positive pair is high compared to the sum of all negative pair similarities.
SimCLR (Figure 10) is a method for computing contrastive learning representations in a self-supervised manner, that is, without any pre-labeled data. As it is self-supervised, we first generate pairs of data to use with the contrastive loss. Specifically, for each image that is available during training, two copies are made and augmented to produce two views of the same image, refer to Figure 11 for example. We are then generating pairs of these augmented views across all images. If a pair is of two views of the same original image, it is labeled as a positive pair, else it is a negative pair. The pairs of views can then be used to train a CNN using the contrastive loss above.
However, the developers of SimCLR noted that reducing the dimensionality of the representations during training using a standard neural network (a multilayered perceptron, MLP) in addition to a CNN improved the final performance. SimCLR, therefore, trains a combination of a CNN and an MLP using the generated pairs and the contrastive loss above as indicated in the figure below.
For SSCD, an additional “regularization term” is added to the loss function above (when training the CNN and the MLP) to add an additional “constraint” to the embedding space layout. Namely, this term is added in the loss function optimized during training:
where zi, zj are representations of images i and j, and Pi is the set of all positive pairs that include image i.In other words, for each image we also maximize the distance of this image from all other images with which it forms a negative pair. Impact and tradeoff using entropy Regularization are described in detail in this paper.
One of the challenges of the image copy detection problem is that we want to retrieve similar images on a large scale, potentially among millions of images. Accelerating the search involves indexing, which is often performed by pre-processing the database (e.g., splitting it into clusters or organizing it in a tree structure) to organize it better for efficient search. For all our tests, we use a default method by the FAISS library, which does not involve any pre-processing of the reference features set – storage in the database is simply sequential, and search in the database is likewise.
For all methods, we perform the K-Nearest-neighbor search based on the FAISS library faiss.knn [9], with the similarity between two images evaluated using the Euclidean distance between the database images (their feature representation) to the query image (its feature representation). Other distance metrics can also be used.
Other operations could be applied within the pipeline after embeddings are created, for reasons of efficiency and consistency. For example:
PCA Whitening is a processing step for Embeddings that makes them less redundant. Dimensions of Embeddings can be highly correlated, and whitening through the use of PCA reduces this degree of correlation.
Similarity Normalization is used to compute similarity scores during retrieval. Mathematically, it subtracts from the similarity score between a query image q and a reference image r the average similarity between the query and multiple images in the dataset. This operation may improve performance but adds operational complexity.
We do not discuss these operations as we found they have little impact during the experiments.
In the experiment below, we use a subset of the DISC 2021 dataset [6] from Facebook’s Image Similarity Competition. For simplicity, we took 1500 original images as the Reference dataset and 1000 transformed images as the Query dataset. Among the Query dataset, there are 500 in-domain query images, i.e. there exists a corresponding original in the reference dataset, and 500 out-of-domain query images, i.e. there is no original image in the reference dataset. We worked with pre-trained models to get the embeddings.
Experiments for each method can be divided into 2 stages.
We define:
TP (True Positive): We use an in-domain query image, the search returns candidate images from the database, and top 10 candidate images that the method returns contain the correct original image.
FN (False Negative): We use an in-domain query image and the method returns nothing or returns top 10 candidate images that are only irrelevant images.
TN (True Negative): We use an out-of-domain query image and the method returns nothing
FP (False Positive): We use an out-of-domain query image and method returns something (top 10 or less)
We calculate the following criteria:
Recall@10 = TP/(TP+ FN): this measures the proportion of actual positives that was found. It indicates how well the method finds previously seen images.
Precision@10 = TP/(TP+ FP): this measures how reliable the predictions are. This means the method suggests (1-Precision@10) of the time “copies” of an image even if that image was never seen before.
This is a Local Detector Based Model. We first extract SIFT descriptors for all images, and then use Vlad to create the final 128 x K – dimensional embeddings for each image
Directly deploy the BiT-M R50x1 model to extract features. This pre-trained model implemented ResNet-50, and it is trained to perform multi-label classification on ImageNet-21k, a dataset with 14 million images labeled with 21,843 classes. The embeddings are 2048-dimensional.
We resize all images to square with an edge size of 480 pixels, and then directly deploy the sscd_disc_mixup model. The embeddings are 512-dimensional.
We resize all images to square with an edge size of 512 pixels, and then directly deploy the DeNA model (details can be found in [5]). The embeddings are 512-dimensional.
Performances of the different methods tested are shown in the table below. Overall the two deep learning based representations (SSCD and DENA), that are more specialized for the copy detection problem, perform the best – and similar to each other. Instead, a system based on a general visual representation shows very poor performance in this case, while the SIFT based approach is in between.
Method | Threshold | Recall | Precision |
General visual representation | 9500 | 7.40% | 7.35% |
9500 | 11.20% | 10.81% | |
SIFT-VLAD | 0.3 | 35.00% | 36.02% |
0.1 | 48.00% | 32.43% | |
SSCD | 0.25 | 67.40% | 99.70% |
0.20 | 73.60% | 92.23% | |
DENA | 0.40 | 73.00% | 98.38% |
0.22 | 71.00% | 97.26% |
In this blog, we described a general pipeline for developing a system to detect similar images in a database. We have identified key components and discussed commonly used options for them. We tested variations – performances are shown in the table above.
A key message is that, perhaps not surprisingly, the choice of features to represent images is the most significant one. Deep learning-based representations are state-of-the-art, but even in this case, one needs to use representations that are matching the specific task at hand. For example, a general visual representation doesn’t necessarily work for copy detection, the main reason perhaps being that the network is trained on image classification tasks, thus the output embeddings of images probably represent image categories’ information without capturing image manipulations. On the contrary, the main idea of self-supervised contrastive learning is to project images generated (via manipulations) from the same original one onto similar embeddings – and others onto very different ones. Finally, local descriptor-based methods such as SIFT-VLAD have relatively poor performance in this specific case and dataset (although they are known to work well in many applications): they probably cannot capture all the types of manipulations (e.g.,beyond rotation and scaling) one can do on images.
In addition, while there are multiple image post-processing steps explored in the literature, we found these can potentially add only a little extra performance in this case. For example, we explored (not reported here) Geometric verification using RANSAC [15] which only slightly improved performance, but as a result, using it we need to sacrifice computational time. Of course, future research may improve the impact of such steps.
[1] The Image Similarity Challenge and data set for detecting image manipulation
[2] SIFT Meets CNN: A Decade Survey of Instance Retrieval
https://arxiv.org/abs/1608.01807
[3] The 2021 Image Similarity Dataset and Challenge
https://arxiv.org/abs/2106.09672
[4] A Self-Supervised Descriptor for Image Copy Detection (SSCD)
https://github.com/facebookresearch/sscd-copy-detection
[5] Contrastive Learning with Large Memory Bank and Negative Embedding Subtraction for Accurate Copy Detection
https://arxiv.org/abs/2112.04323
[6] DISC2021 dataset
https://sites.google.com/view/isc2021/dataset
[7] Object Recognition from Local Scale-Invariant Features
https://www.cs.ubc.ca/~lowe/papers/iccv99.pdf
[8] BiT-ResNet
https://github.com/google-research/big_transfer
[9] FAISS library
https://github.com/facebookresearch/faiss
[10] Aggregating local descriptors into a compact image representation
[11] Contrastive Loss Explained
https://towardsdatascience.com/contrastive-loss-explaned-159f2d4a87ec#:~:text=Contrastive loss takes the output,the distance to negative examples.
[12] From Points to Images: Bag-of-Words and VLAD Representations
[13] What is an interest point? SIFT Detector
[14] Introduction to SIFT( Scale Invariant Feature Transform)
https://medium.com/data-breach/introduction-to-sift-scale-invariant-feature-transform-65d7f3a72d40
[15] Geometric Verification with RANSAC
Online content moderation has been an increasingly important and debated topic, with new regulations, such as the EU’s Digital Services Act (DSA), expected to further reinforce this trend. Regulations will create more legally-binding obligations for online platforms with respect to content moderation, in order to improve users’ online well-being and the better functioning of the online world.
However, while millions of posts appear every day on social platforms, only a few hundred thousand people work in the current content moderation industry. Despite plans from platforms to recruit more moderators, the amount of work managed by each moderator remains very large: they often have to review thousands of posts every day, leaving them with a very narrow (and stressful) window to decide whether or not an online post should be removed, raising possible issues regarding the accuracy, consistency and potential fairness of a company’s content moderation and its impact on free speech.
In addition to the very limited time to make moderation decisions, the quality of moderation can also be affected by AI tools deployed by platforms, the highly contextual nature of many online posts, and the large quantity of online content falling in the grey zone between harmful and safe. Potential biases of content moderators further exacerbate the issue. For example, some moderators might be too lenient or too strict with respect to company guidelines, and can also be impacted by how long they have been working in the day, others may be accurate on some categories of instances but lack the expertise or training on some others, while other moderators might be biased specifically towards some categories of content (e.g., culturally, politically, etc).
Ensuring the quality of content moderation is a challenge that has important implications for the proper functioning of social media and freedom of expression online. Quality assurance (QA) for content moderation is essential to ensure that the right balance between safety and freedom of expression is met in a fair and effective manner. Poor content moderation can also raise reputation, regulatory, and other business risks for online platforms, including a possible loss of users. QA becomes even more challenging and important as companies outsource content moderation to external providers – whose quality also needs to be continuously monitored. In this context, online platforms are looking for ways to monitor and improve the quality of their moderation processes. Quality can be measured using metrics such as accuracy, consistency and fairness (e.g. similar cases get similar decisions). Consistency is critical both over time for each moderator and across moderators.
The typical quality assurance process for online content moderation is based on performing regular (for example weekly) controlled evaluations: for example, after carefully labelling a number of content items (e.g., users’ posts), managers provide them to multiple moderators, which allows to compute a score for each of them based to how they perform relative to each other as well as relative to the desired labels the company selected for these items.
However, this common QA practice does not leverage all data available, and as the evaluations are done only once a while, one cannot detect potential QA issues real time – for example because a moderator may drift even temporarily. An important challenge related to quality and consistency evaluation is the ability to use many, if not all past decisions from all moderators, in order not to be limited by a small number of weekly test instances. Very importantly, this help get rid of additional evaluation processes entirely, while improving the reliability of the evaluation and ensuring continuous monitoring.
In our study, we discuss some approaches for managing content moderation quality real time, without the need to perform regular (and costly!) tests or requiring multiple moderators to handle the same cases. We develop a new method for comparing content moderators’ performances even when there is no overlap across moderators in the content they manage (i.e., each instance is only handled by a single moderator), using the data of the moderators’ previous decisions. To this purpose, we also discuss how to adapt crowd labelling algorithms for performing QA in content moderation – an approach that we believe can be promising to further explore.
To find out more about building an accurate and efficient content moderation system, contact us at info@tremau.com.
To download Improving Quality and Consistency in Single Label Content Moderation, please fill out the form below.
Tremau Policy Research Team
Content moderators have become indispensable for online platforms’ everyday operations. However, major platforms outsourcing their content moderation to contractors all around the world face an increasingly pressing challenge: Employee turnover at these sites is high, as most moderators cannot continue for more than 2 years on average.
Poor mental health is one of the major reasons behind moderators leaving their positions, as their jobs require them to review large volumes of texts, pictures, and videos containing highly disturbing content around violence, extremism, drugs, child sexual abuse materials (CSAM), self-harm, and many more. Long-term exposure to such harmful content has triggered serious mental health issues among moderators, including depression and anxiety. With deteriorated mental health conditions, more severe issues like PTSD and addictions to drug and alcohol have also been noted to emerge.
Disturbing content does not only cost content moderators their mental health, it also has a financial impact on platforms. For example, the San Mateo Superior Court required Facebook to pay millions to content moderators who had developed PTSD on the job. Moreover, as Non-Disclosure Agreements (NDA) have become common practices, content moderators often find themselves unable to talk to trusted friends or family members about their work experience. This leads to a lack of support for moderators, misunderstanding of their precarious conditions, and growing unwillingness to voice their difficulties.
The intensity of the job is another major problem. While there are millions of posts appearing on various social platforms every day, there are only about 100,000 people working in the current content moderation industry. Despite mega-platforms’ promises to recruit more moderators in recent years, the amount of work distributed to each moderator continues to be enormous: they have to review thousands of posts each day, which leaves them only with a very tight window to decide whether a post should be removed or not – creating new issues around the accuracy and consistency of a company’s content moderation and impact on freedom of expression.
Indeed, ensuring the quality of content moderation is a challenge with important implications about the well-functioning of social media, freedom of expression, and fairness. Besides the very limited time frame for making moderation decisions, the quality of moderation can also be affected by individual biases, the AI tools deployed by platforms and the highly contextual nature of many posts, not to say the large amount of online content in the grey area between harmful and harmless. Apart from problems brought by online content, the complex constellation of laws, policies, platforms’ terms and conditions, and internal instructions also add difficulties for moderators to respond quickly and accurately.
The tech industry has already acknowledged these challenges. Several solutions exist to address these problems, but they still have considerable limitations. AI has been widely implemented in content moderation for both removing anything that is explicitly illegal, and for detecting suspicious content for human moderators to investigate. However, one salient drawback of AI is that it can only work on those “straightforward” cases covering broad categories, such as “nudity” or “blood”: for anything more nuanced, the current AI tools have proven to be prone to mistakes. For example, Thomas Jefferson’s words in the Declaration of Independence once got taken down automatically as “hate speech” because the phrase “Indian Savages” was flagged as inappropriate by the AI tool.
Another problem with current AI tools in content moderation is that most AI only works on text and visual-based content, while AI tools tailored for audio-based content moderation or in more interactive settings, such as live chat and live stream, are still in development. Furthermore, it has been established that AI tools often reflect the inherited biases of their creators, and for tools empowered by “black boxes”, their opaque decision-making processes may even create new problems in transparency auditing and quality assurance.
Providing mental health care for moderators is another important practice across companies. Wellness coaches and counselors are commonly seen in content moderation sites, as well as occasional employee support programs, but many moderators consider them inadequate and call for professional intervention from clinical psychiatrists and psychologists. “Wellness break” included in daily working hours is another expected buffer against deteriorating mental health, but it is also criticized for being too short compared to hours of exposure of traumatizing content.
There is still a lot that needs to be done to protect those who protect us from the worst aspects of the Internet. Possible improvements should be pursued in both technological and organizational solutions. Both the industry and academia have been working on improving the accuracy and efficiency of AI in automated detection and removal of harmful content. Apart from training smarter AI for more efficient automation, AI may also contribute in preventing or reducing exposure to disturbing content by interactively blurring them for human moderators.
Technology can also play a role in developing tools specialized in assisting content moderators in their work routines, in order to promote better task distribution across moderators, facilitate smoother internal communications for more complicated moderation decisions, and achieve more streamlined quality assurance of content moderation.
Companies should also assume more responsibilities in proactive protection of their workers’ mental wellness. For example, the tech industry can learn a lot from previous experiences from other high-risk jobs, such as the police, journalists, and child exploitation investigators. A first critical practice in these fields is to clearly inform employees and those who want to join the inherent risk of reviewing harmful content. Companies should also invest in building regular, long-term resilience training programs and hosting high-quality clinical mental health care teams in-house.
More importantly, there are strict maximum exposure times especially for those working in environments containing hazardous substances. Similar standards of maxim exposure time should apply to content moderation. Finally, across the nascent content moderation industry, building meaningful interpersonal networks among moderators can be valuable, fostering mutual support among “insiders” and eventually bringing the interests of content moderators to future agenda, who are crucial stakeholders of regulating the digital space.
On 8 February 2022, Safer Internet Day, e-Enfance – a French NGO fighting against children’s bullying and online harassment – launched a nationwide app, Application 3018, to facilitate the reporting of cyber harassment. The application is combined with a dedicated online trust & safety platform that enables faster victim support and more efficient removal of harmful content by online platforms.
The Internet Watch Foundation reported 2021 to be the worst year for child sexual abuse online with a surge in cases of online grooming over lockdown. A recent study conducted by WeProtect Global Alliance revealed that in almost 70% of cases, the respondents, between 18-20 years old, had experienced online sexual harm in their childhood.
These statistics paint a disturbing picture and underscore the importance of interventions to protect children online.
The launch of Application 3018 comes in the wake of French President E. Macron’s call to governments and other relevant providers and organisations to “stand up for children’s rights in the digital environment”.
The application was created to specifically allow children to report instances of cyberbullying, ranging from inappropriate text messaging to sexual abuse material – all through an easy-to-use interface.
Last year, e-Enfance noted a 60% rise in cyber-harassment cases and reported receiving approximately 18000 calls on their hotline from victims. Victims filed complaints in only 34% of the cases.
Within the first week of the introduction of the new system, e-Enfance saw a significant (i) increase in the number of cases treated (+30%) while (ii) reducing the amount of time needed to assess and subsequently notify the relevant online platform. Given that 62% of children exposed to sexually explicit content receive it on their phones, the app provides a better interface for them to report than the traditional webpage forms.
As we continue monitoring the performance of this tool, we predict a further increase in notices sent through the app and an improvement in the speed of treatment directly increasing the number of content reported to platforms.
As the digital space increasingly permeates every aspect of our lives, from socialisation to e-commerce to education, we are faced with an abundance of opportunities and risks. Both the EU Code of Conduct and the Digital Services Act will have a significant impact on moulding this space to better protect citizens’ interests, especially those of children.
To protect children online, responsibility and compliance will be needed from every actor and stakeholder involved. Upcoming regulations impose new reporting obligations to trusted flaggers requiring them to update their existing report management processes enabling transparency reporting and the production of audit trails. The platform used by e-Enfance tackles all these requirements while leveraging innovative and highly secure digital and AI solutions, to achieve a compliant and safe environment for all.
We invite everyone, civil society, online platforms, regulators and governments, to join us in building a digital world that is safe and beneficial for all, especially children.
by Louis-Victor de Franssu,Toshali Sengupta