Subscribe to the latest remote jobs:

Site Reliability Engineer

🇺🇸 United States

Management

AWS

Azure

Terraform

Finance

Design

Devops

Testing

Site Reliability Engineer

from 🇺🇸 United States

The Site Reliability Engineer (SRE) is responsible for ensuring the availability, scalability, performance, and resiliency of enterprise cloud platforms across Azure, and AWS environments.

This role combines software engineering, automation, and infrastructure expertise to operationalize reliability engineering practices, drive cloud-native resiliency patterns, and enable business-critical applications to meet defined SLAs, SLOs, and compliance requirements.

The SRE partners with engineering, security, and operations teams to implement observability, incident response frameworks, and reliability automation, aligning with enterprise architecture standards and regulatory expectations.

Key Accountabilities/Deliverables:

Design and implement highly available, fault-tolerant architectures using cloud-native services (microservices, containers, serverless)
Define and operationalize SLOs, SLIs, and error budgets for critical applications and platforms
Build and maintain Infrastructure as Code (IaC) (Terraform) to ensure repeatable and compliant deployments
Develop automated remediation and self-healing capabilities to reduce MTTR and improve system resilience
Establish enterprise-level monitoring, logging, and observability frameworks (Datadog, Azure Monitor, CloudWatch, OpenTelemetry, Azure Application Insights)
Drive cost optimization (FinOps) initiatives, including resource utilization tracking and rightsizing recommendations
Support DR/BCP strategy execution, including failover testing and regional isolation validation
Collaborate with application teams to embed reliability engineering practices into CI/CD pipelines

Technical Knowledge and Understanding:

Strong expertise in cloud platforms (Azure, AWS)
Deep understanding of cloud-native architecture patterns (microservices, containers (Azure Container Apps/AKS/EKS), serverless (Azure Functions/AWS Lambda))
Proficiency in Infrastructure as Code (Terraform, ARM/Bicep)
Experience with observability platforms (Datadog, Azure Monitor, Azure Application Insights)
Knowledge of CI/CD pipelines and GitOps practices
Expertise in system reliability concepts:
- SLI / SLO / SLA management
- Chaos engineering
- High availability & fault isolationFamiliarity with security, compliance, and regulatory controls (SOC, ISO, cloud security frameworks)

Experience:

5+ years experience in Site Reliability Engineering, DevOps, or Cloud Engineering
Proven experience supporting mission-critical production systems at scale
Hands-on experience with incident management and on-call operations
Experience implementing automated monitoring, alerting, and remediation frameworks
Exposure to regulated environments (insurance, financial services) preferred
Demonstrated ability to work across cross-functional architecture, engineering, and operations teams

Applicants must be authorized to work for any employer in the U.S. We are unable to sponsor or take over work authorization sponsorship now or in the future for this position.

At Core Specialty, you will receive a competitive salary and opportunities for professional development and advancement. We offer medical, dental, vision, and life insurances; short and long-term disability; a Company-match of 100% of a 6% contribution 401(k) plan; an Employee Assistance Plan; Health Savings Account, Flexible Spending Account, Health Reimbursement Account, and a wellness program

Check out similar jobs as well. The more jobs you apply to, the higher your chances of getting a job.

Other Roles - think you're a fit for something else?

🇺🇸 United States

Management

Site Reliability Engineer

Other Roles - think you're a fit for something else?

[REMOTE] Senior Paid Acquisition Specialist

Talent Acquisition Specialist — Sales Hiring

Software Architect, Air Supply and Search (GDS - Sabre first)

Financial Controller

Talent Acquisition Specialist

Director, Wholesale Sales

Business Development Representative - Texas

Business Development Representative - Southeast

Business Development Representative - Northeast

Business Development Representative - Mid Atlantic

Business Development Representative - California

Electronics Engineer

Expert Social Media Manager

Technical Account Manager

Executive Assistant

Senior Account Executive

Chief Information Security Officer

Program Director

Training Content Designer (LMS Admin)

Other Roles - think you're a fit for something else?

[REMOTE] Senior Paid Acquisition Specialist

Talent Acquisition Specialist — Sales Hiring

Software Architect, Air Supply and Search (GDS - Sabre first)

Financial Controller

Talent Acquisition Specialist

Director, Wholesale Sales

Business Development Representative - Texas

Business Development Representative - Southeast

Business Development Representative - Northeast

Business Development Representative - Mid Atlantic

Business Development Representative - California

Electronics Engineer

Expert Social Media Manager

Technical Account Manager

Executive Assistant

Senior Account Executive

Chief Information Security Officer

Program Director

Training Content Designer (LMS Admin)