NLP Research Project Mentee
Published in Burlington, Vermont, 2024
Oracle Labs
Tackled the lack of labeled data in languages spoken by fewer people by generating a shared multilingual representation for cross-language NLP tasks
Harnessed word-embeddings to borrow distributional statistics from context words in data from higher-resourced languages and applied Artificial Code-Switching (ACS) to generate language agnostic NLP models for any task.
Analyzed the effect of ACS on different aspects of a language and how varying the number of languages in the embedding training and similarities between those languages in the multi-lingual corpus improves the overall performance.