Subscribe to the latest remote jobs:

Audio QA Lead - Part Time Contractor

šŸ‡ŗšŸ‡ø United States

Python

Machine Learning

Design

UI/UX

Cybersecurity

SQL

Video

Voiceover

$25 - $45 / hourly

Audio QA Lead - Part Time Contractor

from šŸ‡ŗšŸ‡ø United States

$25 - $45 / hourly

About Besimple AI

Voice data for AI


Tech description:

# Technology & Hard Problems

## **Product Surface**

Besimple generates **task-specific annotation interfaces and guidelines on the fly**, runs **human-in-the-loop (HITL) workflows** at scale, and trains **AI judges** that learn from human decisions to triage easy cases and flag ambiguous ones. We support **multimodal data** (text, chat, audio, video, traces) and **enterprise needs** like **on-prem deployment** and **fine-grained access control**. Under the hood, we optimize for **latency**, **correctness**, and **adaptability**—simultaneously.

## **Hard Technical Problems We’re Tackling**

* **Generative UI for Any Data Shape**
Turn arbitrary inputs—**JSON logs, multi-turn dialogs, code diffs, speech transcripts, video frames**—into ergonomic, versioned UIs with validation and assistive affordances (schema inference, promptable components, live preview with safe defaults).
* **Human-in-the-Loop Orchestration**
Route tasks to the right experts, enforce **calibration and quality gates**, measure **IRR**, and run **adjudication** when disagreement is informative—not noise.
* **AI-Judge Training & Control**
Distill human rubrics into model-based evaluators that **score live traffic**, self-update with new human decisions, and stay inside **guardrails** (confidence thresholds, policy constraints, auditability).
* **Production-Grade Eval**
Build **gating suites and regression tests** aligned to product KPIs and safety constraints; snapshot datasets; track drift; and **plumb production signals** back into evaluation and training.
* **Enterprise Delivery**
**On-prem optional installs**, isolation-by-tenant, **SSO/RBAC**, and **audit trails** that satisfy infosec without slowing iteration.

## **What You’ll Own**

End-to-end slices of the product—e.g., building a new **multimodal interface**, designing a **calibration workflow** that improves IRR, shipping a **rubric-aware AI judge** for a new domain, or tightening **dataset lineage** so a customer can trace a production decision back to ground truth.

## **Why This Is a Great Fit for Builders**

This work sits at the intersection of **product engineering, systems design, and applied AI**. You’ll ship tangible interfaces, shape **evaluation science**, and see your work **block real regressions**. The feedback loop is measured in **better models in production**, not vanity benchmarks.


Job description:

### About the role

We are hiring an Audio QA Lead to support the development of high-quality training datasets for next-generation voice AI models.

In this role, you will work hands-on to improve the quality, consistency, and usability of speech datasets across applications such as text-to-speech, transcription, speech-to-speech, ASR, and conversational voice systems. Your work will directly influence how data is collected, reviewed, and delivered for real-world model training.

You will work across three core areas: defining and applying audio quality standards, recording high-quality speech on demand, and performing annotation and QA across speech datasets. This is not a generic audio production role. The work focuses on making audio usable for model training and requires a strong understanding of how data quality impacts model.

**This is a part-time contractor role that can turn into full-time role.**

## What you'll do

* Develop, refine, and apply audio quality guidelines for speech and voice datasets.
* Review audio files against technical, linguistic, and task-specific standards, making clear approval, rejection, or revision decisions.
* Identify audio and annotation issues such as background noise, clipping, distortion, plosives, echo, low signal, segmentation errors, transcript mismatches, and speaker-label inconsistencies.
* Perform annotation and QA tasks, including transcription, timestamp validation, VAD/segmentation, diarization, pronunciation checks, and metadata review.
* Record speech based on provided scripts and performance guidelines, delivering natural, high-quality, specification-compliant audio.
* Document edge cases, update review rubrics, and improve internal SOPs and quality standards.
* Collaborate with research, ML, and operations teams to translate model requirements into data specifications and evaluation criteria.
* Ensure consistency and integrity across audio files, transcripts, annotations, and associated metadata.

## Who we're looking for

The ideal candidate has direct experience working with audio AI datasets and understands what makes speech data effective for model training. You have a strong ear for audio quality, are comfortable applying annotation standards, and can consistently produce and evaluate high-quality recordings.

* Direct experience working with audio AI training datasets or evaluation workflows.
* Hands-on experience with TTS, ASR, transcription, speech-to-speech, or related voice AI systems.
* Experience developing or applying audio quality standards in production environments.
* Experience with speech annotation tasks such as transcription, timestamp QA, VAD/segmentation, and diarization.
* Strong auditory judgment with the ability to consistently identify subtle audio quality issues.
* Ability to produce high-quality recordings in a controlled, quiet environment using professional or near-professional equipment.
* Strong written communication skills with the ability to provide clear, actionable feedback.
* High attention to detail and sound judgment when evaluating edge cases.
* Comfort working with structured data formats such as spreadsheets, CSV, or JSON.

### Bonus qualifications

* Experience with audio tools such as Audacity, Praat, or similar.
* Basic scripting skills in Python, Bash, or SQL for QA or dataset analysis.
* Background in linguistics, phonetics, speech research, or voiceover work.
* Experience evaluating both real and synthetic audio.
* Multilingual experience or familiarity with accents and dialect variation.
* Familiarity with compliant handling of consented and licensed voice data.



by @maxrusakovic