Science and Technology

HATE COVID-19.PT: a project that will automatically detect online hate speech in Portuguese

Direct verbal insults. Offensive and insulting words with intent to depreciate. Physical threats to the lives of others or their families. Statistics show that Covid-19 has led to an increase in hate speech.

Several European and international institutions have issued recommendations on combating racism, xenophobia and homophobia, also alerting to the low level of report and participation. Covid-19 has led to an increase in hate speech towards already vulnerable communities. In response to this problem, FCT – Fundação para a Ciência e a Tecnologia, created a special support – “Impact of the COVID-19 pandemic in the crimes of incitement to hatred and violence and in hate speech”. One of the projects selected includes a Técnico/INESC-ID research team, led by professor Paula Carvalho (Department of Computer Science and Engineering – DEI).

The main objective of the project “HATE COVID-19.PT -Detecting Overt and Covert Hate Speech in Social Media” is to automatically detect online hate speech in Portuguese. “On the one hand, it is a priority to invest in languages such as Portuguese, whose linguistic resources for mapping hate speech are still quite scarce”, explains professor Paula Carvalho. “On the other hand, we intend to analyse this phenomenon in the Portuguese online community, taking into account the temporal dimension, in order to understand the impact of the pandemic on hate speech, on social networks”, adds the professor.

The research team intends to study the dynamics of this phenomenon in a specific context, that is why the analysis of hate speech will focus on the pandemic period. “This project will be crucial to assess whether, in fact, the current circumstances have significantly boosted this phenomenon”, and to identify the main targets of hate speech in Portugal, in pandemic times.

The project, which started on 1st May, will last 10 months and results from a partnership between INESC-ID (proponent entity), Lusa News Agency and the Portuguese National Cybersecurity Center (CNCS). The project has received a total funding of €35,892.

This project involves the creation of a large annotated corpus from social media, covering the Covid-19 pandemic, which will support the development of a machine learning prototype to detect hate speech and assess its explicitness and intensity, considering the time period and geolocation data. “This prototype will be available to the community, and it can be explored by linguists, communication and social sciences experts, media professionals, among others, in order to monitor, analyse and assess the evolution of hate speech on social media”, says the DEI professor.

The importance of enhancing the Portuguese language in this field

The research team involved in “HATE COVID-19.PT” has been focusing on issues that are directly or indirectly related to hate speech, more precisely “irony detection in the media” and, more recently, “the identification of strategies for detecting misinformation, in general, and on social media, in particular”, says professor Paula Carvalho.

According to the INESC-ID researcher “most studies have focused on direct hate speech, usually including insulting words or expressions”. “Users often express themselves – in an indirect or underhanded way – through linguistic and rhetorical strategies, which difficult speech explicitness”, stresses the Técnico professor.

“Our team brings together multidisciplinary researchers: natural language processing, artificial intelligence and communication sciences”, says professor Paula Carvalho.

In addition to professor Paula Carvalho, who is a linguist and works with natural language processing, other INESC-ID scientists are involved in natural language processing and artificial intelligence, as well as a social scientist/DEI professor with an interest in digital media and man machine interaction. Some of these researchers are currently involved in the Contrafake project.

Knowledge is a major weapon against hate speech

According to ECRI annual report 2020, racism, racial discrimination and intolerance are growing. “I believe that knowledge is the most effective response to prevent these phenomena”, highlights the INESC-ID researcher. “Therefore, these initiatives promoting research and the advancement of knowledge in such crucial areas are essential to stop this type of crime”, she adds.

Although the project has an exploratory nature, the team is confident “the results will allow the academic community to monitor and deepen the progress of hate speech, thus promoting the development of research in various areas that focus on this issues”. “In our opinion, the analysis of subsequent studies may guide policy makers and protect the most affected groups”, adds the professor.