Subscribe to the latest remote jobs:

Software Engineer - Infrastructure

🇺🇸 United States

RabbitMQ

Management

Python

Kubernetes

AWS

GCP

PostgreSQL

MySQL

MongoDB

Terraform

DynamoDB

GitHub

Machine Learning

Amazon

Redis

NoSQL

Backend

Devops

SQL

Testing

$175K - $225K

Software Engineer - Infrastructure

from 🇺🇸 United States

$175K - $225K

About Emergent

AI app builder that turns your ideas into monetizable software.


Tech description:


Job description:

Emergent builds autonomous coding agents that replace traditional software development by generating, testing, and deploying production applications directly from plain-language intent. Our systems run in production at global scale and are used to build millions of real applications.

Since public launch, Emergent has reached **$100M ARR in 8 months. 6M+ users across 190+ countries** have built **6.5M+ applications** on Emergent. We've raised **$100M+** , backed by **Khosla Ventures, SoftBank, Google, Lightspeed, Prosus, Together, and Y Combinator.**

We're solving the hard part of AI-driven software creation: correctness, reliability, security, and scale in real production systems. The team is built by **repeat founders, Olympiad medalists, IIT & IIM alumni,** and leaders from **Google, Amazon, and Dropbox.**

We're hiring builders who want ownership, speed, and impact at global scale.

**What You'll Be Responsible For**

**Platform & Infrastructure**

- Maintain stability of our platform consisting of distributed microservices closely interacting with Kubernetes and cloud providers (GCP, AWS)
- Manage Kubernetes workloads with  **ArgoCD**  (GitOps) — deploy, monitor, and troubleshoot application syncs, resource trees, and rollouts
- Debug and resolve complex Kubernetes issues across clusters
- Manage  **CDN and edge infrastructure**  (Cloudflare) for performance, caching, and traffic management
- Automate infrastructure lifecycle operations and workflows

**Observability & Incident Response**

- Own the observability stack:  **Grafana**  (dashboards, Loki logs, Prometheus metrics),  **New Relic**  (APM, golden metrics, transaction analysis)
- Enhance monitoring, alerting, and distributed tracing across services
- Participate in on-call rotation via  **PagerDuty** , handle incident response, and perform root cause analysis
- Proactively identify reliability risks before they become incidents

**AI Agent Infrastructure**

- Support the platform that runs AI agent workloads — job scheduling, trajectory tracking, environment provisioning, deployments and cost attribution
- Develop Kubernetes controllers and operators to extend platform capabilities for agent orchestration

**Collaboration & Internal Tooling**

- Work closely with product and backend teams to ensure platform scalability and reliability
- Build internal tools, automate workflows, and integrate systems to improve team productivity
- Stay current with Kubernetes releases, CNCF ecosystem updates, and cloud-native best practices

**What We're Looking For**

**Core Requirements**

- 4+ years of software/platform engineering experience with production systems
- Strong proficiency in  **Go**  or  **Python**  — you write production code in at least one daily
- Hands-on experience  **building and deploying services on Kubernetes**  — not just YAML, you've developed something that runs on K8s
- Experience with GitOps tooling (ArgoCD, Flux, or similar)

**Systems Fundamentals**

- Strong  **networking and DNS fundamentals**  — TCP/IP, HTTP, load balancing, DNS resolution, TLS, and debugging connectivity issues
- Solid  **Linux/OS fundamentals**  — process management, filesystem, memory, systemd, and comfortable debugging with tools like strace, tcpdump, and netstat

**Data & Messaging Infrastructure**

- **Relational databases**  — experience with PostgreSQL, MySQL, or similar; indexing, query optimization, replication, and backup/restore procedures
- **NoSQL databases**  — familiarity with MongoDB, DynamoDB, Redis, or similar for document/key-value workloads
- **Caching**  — experience with Redis, Memcached, or similar for application and infrastructure-level caching
- **Message queues & streaming**  — hands-on with Kafka, SQS, RabbitMQ, or similar for event-driven architectures
- Strong SQL skills for debugging and operational queries

**Infrastructure & Observability**

- Comfortable with the  **CNCF ecosystem**  — Helm, Kustomize, cert-manager, Ingress controllers, CNI/CSI interfaces
- Hands-on with at least one observability stack (Grafana/Prometheus/Loki, New Relic, Datadog, or similar)
- Familiarity with  **GCP**  and/or  **AWS**  — managed Kubernetes (GKE/EKS), networking, IAM, storage, and cloud-native services (SES, SQS, S3, etc.)
- Experience with  **CDN/edge platforms**  (Cloudflare, CloudFront, or similar)

**Nice to Have**

- Experience building  **Kubernetes Operators**  (kubebuilder, operator-sdk, or controller-runtime)
- Experience tuning Kubernetes core components (API server, kubelet, scheduler)
- Familiarity with AI/LLM infrastructure — token management, cost tracking, agent orchestration
- Experience with CI/CD pipelines (GitHub Actions, automated testing, deployment pipelines)
- Infrastructure as Code experience (Terraform, Pulumi, or similar)
- Previous work on large-scale distributed systems or platform-as-a-service
- Startup experience — you thrive in fast-paced, ambiguous environments

**What You're Like**

- You're a  **generalist**  who can context-switch between debugging a K8s deployment, setting up a Grafana alert, and configuring CDN rules — all in the same day
- You enjoy solving complex infrastructure challenges and automating away toil
- You dig deep — when something breaks, you find the root cause, not just the workaround
- You communicate clearly and can collaborate effectively in a fast-moving, distributed team

**Tech Stack**

We don't require previous experience with our entire stack, but enthusiasm for learning is key.

Go · Python · Kubernetes · ArgoCD · Helm · GCP · AWS · Cloudflare · Grafana · Prometheus · Loki · New Relic · PagerDuty · PostgreSQL · MongoDB · Redis · Kafka · GitHub

**Why Emergent Labs**

- **YC S24**  backed with strong investor support
- Building at the frontier of AI-powered software creation
- Small team, high ownership, real impact from day one

**Benefits and Perks:**

- 401(k)
- Health, dental, and vision insurance
- Unlimited Paid Time Off: take the time you need to recharge and come back refreshed
- Flexible Working Hours: work arrangements that fit your life and commitments

Let's build the future of software together.

** **



by @maxrusakovic