NLP Research Project Mentee

Published in Burlington, Vermont, 2024

Oracle Labs

  • Tackled the lack of labeled data in languages spoken by fewer people by generating a shared multilingual representation for cross-language NLP tasks

  • Harnessed word-embeddings to borrow distributional statistics from context words in data from higher-resourced languages and applied Artificial Code-Switching (ACS) to generate language agnostic NLP models for any task.

  • Analyzed the effect of ACS on different aspects of a language and how varying the number of languages in the embedding training and similarities between those languages in the multi-lingual corpus improves the overall performance.

  • Presentation