Using only character ngrams for hate speech and offensive content identification in five low-ressource languages

Bestgen, Yves

DIAL.pr - BOREAL

Accès à distance ? S'identifier sur le proxy UCLouvain

Using only character ngrams for hate speech and offensive content identification in five low-ressource languages

Primary tabs

download

51_final_crs.pdf

Open access
PDF
389.36 K

Bestgen, Yves [UCL]

This paper describes the system proposed by the SATLab for hate speech and offensive content identification in five low-ressource languages. This language-agnostic system applies a classical supervised learning to character n-grams, using no other data than the learning materials. After optimizing a series of parameters, it ranked first in the Bodo task and second in the Gujarati task, for which the learning material contained only 200 tweets. It also performed well in the Sinhala and Assamese task, but was outperformed by several systems in the Bengali task.

metadata

Document type	Communication à un colloque (Conference Paper)
Access type	Accès libre
Publication date	2023
Language	Anglais
Conference	"Forum for Information Retrieval Evaluation", Goa, India (du 15/12/2023 au 18/12/2023)
Peer reviewed	yes
Host document	Ghosh, K., Mandl, T., Majumder, P. & Mitra M. ; "Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation (FIRE-WN 2023)"- 411-417
Publication status	Publié
Affiliation	UCL - SSH/IPSY - Psychological Sciences Research Institute
Links	http://hdl.handle.net/2078.1/280150[Handle]

Bibliographic reference	Bestgen, Yves. Using only character ngrams for hate speech and offensive content identification in five low-ressource languages.Forum for Information Retrieval Evaluation (Goa, India, du 15/12/2023 au 18/12/2023). In: Ghosh, K., Mandl, T., Majumder, P. & Mitra M., Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation (FIRE-WN 2023), 2023, p.411-417
Permanent URL	http://hdl.handle.net/2078.1/280150

User menu

Using only character ngrams for hate speech and offensive content identification in five low-ressource languages

Primary tabs

Footer Help

Languages

Footer menu

User menu

Search form

You are here

Using only character ngrams for hate speech and offensive content identification in five low-ressource languages

Primary tabs

Footer Help

Languages

Footer menu