Bestgen, Yves
[UCL]
This paper describes the system proposed by the SATLab for hate speech and offensive content identification in five low-ressource languages. This language-agnostic system applies a classical supervised learning to character n-grams, using no other data than the learning materials. After optimizing a series of parameters, it ranked first in the Bodo task and second in the Gujarati task, for which the learning material contained only 200 tweets. It also performed well in the Sinhala and Assamese task, but was outperformed by several systems in the Bengali task.


Bibliographic reference |
Bestgen, Yves. Using only character ngrams for hate speech and offensive content identification in five low-ressource languages.Forum for Information Retrieval Evaluation (Goa, India, du 15/12/2023 au 18/12/2023). In: Ghosh, K., Mandl, T., Majumder, P. & Mitra M., Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation (FIRE-WN 2023), 2023, p.411-417 |
Permanent URL |
http://hdl.handle.net/2078.1/280150 |