Machine translation for low-resource languages

Unlock insight into the unheard, yet strategically important voices

VoxCroft has pioneered the world’s first accurate domain-adapted translation models for Ethiopian languages Tigrinya and Oromo, as well as Filipino Ilocano, Indonesian Javanese, and Taiwanese Hokkien, tailored specifically for operations in these information environments. Specializing in the socio-political domain, our models provide both text-to-text and speech-to-text capabilities across a growing range of global languages.

World first to niche languages

World first to niche languages

VoxCroft delivered the world’s first translation models for Ethiopian Tigrinya and Oromo and Filipino Ilocano, Indonesian Javanese, and Taiwanese Hokkien models for operations in these information environments.

Machine Translation Tools

Navigating the challenges of collecting quality data for low-resource languages is often difficult and costly. Natural Language Processing (NLP) technologies require good quality training data to enhance the accuracy of their models. VoxCroft is amassing valuable data for use in various NLP initiatives through its population-centric intelligence work in austere media environments.

VoxCroft also works with organizations to manage, expand, and supplement their datasets that can be used in various intelligence and machine learning applications. Our proprietary Proteus micro-tasking and data collection platform, coupled with our network of global linguists and team of MT and AI experts, allows us to provide scalable solutions for NLP projects in the toughest operating environments.

Bilingual Corpora for Low Resource Languages

High-quality bilingual sentence pairs, speech-to-text data and data sets containing audiovisual and text content in low-resource languages by using domain-specific corpora tailored to customer requirements.

Crowd-Sourced Language Experts

An expansive network of over 400 mother-tongue language and subject area experts from austere language environments who undergo rigorous testing and training to work on a range of language tasks that meet our in-house quality assurance (QA) criteria.

Proteus Micro-Tasking Platform

A proprietary data-agnostic online and mobile platform that supports micro-tasks including translation, transcription, image identification, audio recording, and QA tasks, and can be adapted to specific customer requirements.

Machine Translations Models

Machine translation (MT) models for languages for which commercial MT tools currently do not exist or perform poorly. Our models, trained on domain-specific corpora often outperform commercial MT models trained on larger corpora.

Keyword Spotting for Broadcast Monitoring

VoxCroft provides real-time analysis of audio content in languages, including Amharic, Tamashek, Wolof, and Zarma, and we build tailored keyword spotting models for specific customer use cases.

Technical and Operational Consultancy Services

Customized language services, particularly in austere language environments, and work with customers to develop tailored NLP solutions for seamless integration with their existing systems and processes.