Did you try to use BERT multilingual? From a quick test, I see the number of tokens for racialized names is different from what you've found. Very interesting and important study!

Riccardo Di Sipio
Riccardo Di Sipio

Written by Riccardo Di Sipio

Senior Machine Learning developer at Dayforce. NLP, LLMs, graph neural networks. Formerly physicist at U Toronto, Bologna, CERN LHC/ATLAS.

Responses (1)