Audio QA Lead - Part Time Contractor
šŗšø United States
Python
Machine Learning
Design
UI/UX
Cybersecurity
SQL
Video
Voiceover
$25 - $45 / hourly
Audio QA Lead - Part Time Contractor
from šŗšø United States
$25 - $45 / hourly
Voice data for AI
Tech description:
# Technology & Hard Problems
## **Product Surface**
Besimple generates **task-specific annotation interfaces and guidelines on the fly**, runs **human-in-the-loop (HITL) workflows** at scale, and trains **AI judges** that learn from human decisions to triage easy cases and flag ambiguous ones. We support **multimodal data** (text, chat, audio, video, traces) and **enterprise needs** like **on-prem deployment** and **fine-grained access control**. Under the hood, we optimize for **latency**, **correctness**, and **adaptability**āsimultaneously.
## **Hard Technical Problems Weāre Tackling**
* **Generative UI for Any Data Shape**
Turn arbitrary inputsā**JSON logs, multi-turn dialogs, code diffs, speech transcripts, video frames**āinto ergonomic, versioned UIs with validation and assistive affordances (schema inference, promptable components, live preview with safe defaults).
* **Human-in-the-Loop Orchestration**
Route tasks to the right experts, enforce **calibration and quality gates**, measure **IRR**, and run **adjudication** when disagreement is informativeānot noise.
* **AI-Judge Training & Control**
Distill human rubrics into model-based evaluators that **score live traffic**, self-update with new human decisions, and stay inside **guardrails** (confidence thresholds, policy constraints, auditability).
* **Production-Grade Eval**
Build **gating suites and regression tests** aligned to product KPIs and safety constraints; snapshot datasets; track drift; and **plumb production signals** back into evaluation and training.
* **Enterprise Delivery**
**On-prem optional installs**, isolation-by-tenant, **SSO/RBAC**, and **audit trails** that satisfy infosec without slowing iteration.
## **What Youāll Own**
End-to-end slices of the productāe.g., building a new **multimodal interface**, designing a **calibration workflow** that improves IRR, shipping a **rubric-aware AI judge** for a new domain, or tightening **dataset lineage** so a customer can trace a production decision back to ground truth.
## **Why This Is a Great Fit for Builders**
This work sits at the intersection of **product engineering, systems design, and applied AI**. Youāll ship tangible interfaces, shape **evaluation science**, and see your work **block real regressions**. The feedback loop is measured in **better models in production**, not vanity benchmarks.
Job description:
### About the role
We are hiring an Audio QA Lead to support the development of high-quality training datasets for next-generation voice AI models.
In this role, you will work hands-on to improve the quality, consistency, and usability of speech datasets across applications such as text-to-speech, transcription, speech-to-speech, ASR, and conversational voice systems. Your work will directly influence how data is collected, reviewed, and delivered for real-world model training.
You will work across three core areas: defining and applying audio quality standards, recording high-quality speech on demand, and performing annotation and QA across speech datasets. This is not a generic audio production role. The work focuses on making audio usable for model training and requires a strong understanding of how data quality impacts model.
**This is a part-time contractor role that can turn into full-time role.**
## What you'll do
* Develop, refine, and apply audio quality guidelines for speech and voice datasets.
* Review audio files against technical, linguistic, and task-specific standards, making clear approval, rejection, or revision decisions.
* Identify audio and annotation issues such as background noise, clipping, distortion, plosives, echo, low signal, segmentation errors, transcript mismatches, and speaker-label inconsistencies.
* Perform annotation and QA tasks, including transcription, timestamp validation, VAD/segmentation, diarization, pronunciation checks, and metadata review.
* Record speech based on provided scripts and performance guidelines, delivering natural, high-quality, specification-compliant audio.
* Document edge cases, update review rubrics, and improve internal SOPs and quality standards.
* Collaborate with research, ML, and operations teams to translate model requirements into data specifications and evaluation criteria.
* Ensure consistency and integrity across audio files, transcripts, annotations, and associated metadata.
## Who we're looking for
The ideal candidate has direct experience working with audio AI datasets and understands what makes speech data effective for model training. You have a strong ear for audio quality, are comfortable applying annotation standards, and can consistently produce and evaluate high-quality recordings.
* Direct experience working with audio AI training datasets or evaluation workflows.
* Hands-on experience with TTS, ASR, transcription, speech-to-speech, or related voice AI systems.
* Experience developing or applying audio quality standards in production environments.
* Experience with speech annotation tasks such as transcription, timestamp QA, VAD/segmentation, and diarization.
* Strong auditory judgment with the ability to consistently identify subtle audio quality issues.
* Ability to produce high-quality recordings in a controlled, quiet environment using professional or near-professional equipment.
* Strong written communication skills with the ability to provide clear, actionable feedback.
* High attention to detail and sound judgment when evaluating edge cases.
* Comfort working with structured data formats such as spreadsheets, CSV, or JSON.
### Bonus qualifications
* Experience with audio tools such as Audacity, Praat, or similar.
* Basic scripting skills in Python, Bash, or SQL for QA or dataset analysis.
* Background in linguistics, phonetics, speech research, or voiceover work.
* Experience evaluating both real and synthetic audio.
* Multilingual experience or familiarity with accents and dialect variation.
* Familiarity with compliant handling of consented and licensed voice data.










