About This Role

You’ll act as a seasoned technical leader who partners closely with Product Engineering, Security, and Developer Experience to design, operate, and continuously improve the systems our customers depend on every day. The successful candidate must have valid work authorization, as visa sponsorship is not available. This full time role reports directly to the <strong>Director of Information Systems and Head of Security</strong>.</p><p style="font-family: 'Basel Grotesk',Arial,sans-serif; font-size: 11pt; font-weight: 400; margin: 0px; line-height: 1.38; padding: 0px; text-align: justify;"><strong><strong style="font-size: 13.5pt; white-space: pre-wrap;">What you'll do</strong></strong></p><ul><li><strong>Lead reliability-focused design and readiness reviews</strong> for new and existing services, ensuring production readiness, clear rollout and rollback strategies, and strong observability for every launch.</li><li style="font-weight: 400;"><strong>Build, operate, and continuously improve our observability stack</strong> (e.g., logging, metrics, tracing) to provide meaningful dashboards, alerts, and runbooks that enable fast, high-quality incident response across engineering teams.</li><li style="font-weight: 400;"><strong>Own and evolve incident management practices</strong>, including on-call participation, incident response processes, and post-incident reviews that drive long-term remediation and learning across teams.</li><li style="font-weight: 400;"><strong>Plan and execute disaster recovery exercises and game days</strong> to validate our resilience posture, test failover and backup strategies, and systematically reduce single points of failure.</li><li style="font-weight: 400;"><strong>Perform capacity planning and cost optimization</strong> for our cloud infrastructure, helping ensure we run a cost-effective environment that meets performance and availability goals as usage grows.</li><li style="font-weight: 400;"><strong>Identify and drive down systemic reliability risks</strong> across application, infrastructure, and process layers—owning cross-team projects that significantly reduce incident frequency and severity over time.</li><li style="font-weight: 400;"><strong>Collaborate closely with Developer Experience, Security, and product engineering</strong> to embed reliability best practices—testing, rollout patterns, guardrails, and “golden paths”—into shared tools and CI/CD pipelines.</li><li style="color: rgb(0,0,0); margin: 0pt 0px; font-size: 11pt; line-height: 1.38;"><strong>Participate in and help continuously improve the on-call rotation</strong>, using real incidents and near-misses to prioritize automation, better alerting, and clearer documentation.</li></ul><p style="font-family: 'Basel Grotesk',Arial,sans-serif; font-size: 11pt; font-weight: 400; margin: 0px; line-height: 1.38; padding: 0px; text-align: justify;"><strong><strong style="font-size: 13.5pt; white-space: pre-wrap;">Who you are</strong></strong></p><h3>Required</h3><ul><li><strong>5+ years of experience</strong> in Site Reliability Engineering, Production Engineering, Infrastructure Engineering, or a closely related role, including hands-on ownership of production systems.</li><li style="font-weight: 400;"><strong>Strong experience operating modern cloud infrastructure</strong>, ideally on AWS, including core services for compute, networking, storage, and security primitives.</li><li style="font-weight: 400;"><strong>Proficiency with at least one programming language used at <a href="https://himalayas.app/companies/transcend">Transcend</a></strong> (e.g., JavaScript, Typescript, or Python), and comfort reading and reviewing application code for reliability and performance concerns.</li><li style="font-weight: 400;"><strong>Hands-on experience with infrastructure-as-code and CI/CD tooling</strong> (e.g., Terraform, CloudFormation, or similar; modern build/deploy pipelines) to reliably provision and change infrastructure.</li><li style="font-weight: 400;"><strong>Deep familiarity with observability and monitoring systems</strong> (e.g., Datadog or equivalent), including designing alerts that balance coverage and noise to avoid alert fatigue while protecting customer experience.</li><li style="font-weight: 400;"><strong>Proven track record running incident response and post-incident analysis</strong>, including root cause identification, clear documentation, and driving follow-through on remediation work.</li><li style="font-weight: 400;"><strong>Excellent communication and collaboration skills</strong>, with experience working across multiple engineering teams to align on reliability goals, share context, and influence technical direction without formal authority.</li><li style="font-weight: 400;"><strong>Comfort participating in an on-call rotation</strong>, and experience helping to design or improve on-call processes, runbooks, and escalation paths.</li><li style="font-weight: 400;"><strong>Minimum level of education:</strong> Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related technical field, or equivalent practical experience.</li><li style="font-weight: 400;"><strong>Demonstrated ability to thrive in a remote-first, high-autonomy environment</strong>, managing priorities, communicating asynchronously, and driving projects to completion with limited oversight.</li></ul><h3>Preferred</h3><ul><li style="font-weight: 400;">Experience working in a high-growth B2B SaaS environment, ideally on security, data, or privacy-focused products.</li><li style="font-weight: 400;">Experience designing, operating, and tuning Docker-based or serverless architectures on AWS or another major cloud provider.</li><li style="font-weight: 400;">Familiarity with data privacy regulations and practices (e.g., GDPR, CPRA) and how they inform system design, reliability expectations, and incident response requirements.</li><li style="font-weight: 400;">Experience defining and rolling out SRE frameworks such as SLOs/SLIs, error budgets, incident management processes, and production-readiness checklists across multiple teams.</li><li style="font-weight: 400;">Experience working closely with Developer Experience / Platform teams to create paved roads, tooling, and documentation that make it easy for product teams to build reliable services by default.</li><li style="font-weight: 400;">Relevant technical certifications (e.g., AWS Certified Solutions Architect or DevOps Engineer, CKA/CKAD, or equivalent SRE training) are a plus.</li></ul><div class="content-pay-transparency"><div class="pay-input"><div class="description"><h3>Compensation Information</h3><ul><li style="font-weight: 400;">Our comprehensive compensation packages play a big part in how we recognize you for the impact you have on our path to bringing data rights to everyone. </li><li style="font-weight: 400;">The compensation pay range represents our reasonable expectation for this role. Individual pay is determined by multiple factors, including, but not limited to, experience, education, skillset, and geographic location.</li><li style="font-weight: 400;">This specific range applies to Tier 1 labor markets like the SF Bay Area and New York City; it may be adjusted based on the labor market in other geographic areas and the individual qualifications objectively assessed during the interview process.</li></ul></div><div class="title">USA Pay Range </div><div class="pay-range">$170,000—$185,000 USD</div></div></div><div class="content-conclusion"><h3>About <a href="https://himalayas.app/companies/transcend">Transcend</a></h3><p><a href="https://himalayas.app/companies/transcend">Transcend</a> is the compliance layer for customer data, enabling enterprises to activate AI responsibly and at scale. If you are enthusiastic about this role but feel your experience doesn't perfectly match every qualification, we strongly encourage you to apply.</li><li style="font-weight: 400; font-family: helvetica, arial, sans-serif;"><strong>Benefits & Perks:</strong> <a href="https://himalayas.app/companies/transcend">Transcend</a> employees enjoy a comprehensive benefits program that includes flexible PTO, parental leave, a 401(k) match, and competitive compensation packages that include employee equity.

About the role

We’re currently seeking a Senior Site Reliability Engineer to join our extraordinary team and own the reliability, scalability, and performance of Transcend’s privacy infrastructure. You’ll act as a seasoned technical leader who partners closely with Product Engineering, Security, and Developer Experience to design, operate, and continuously improve the systems our customers depend on every day. You’ll help define our SRE practices, lead cross-team reliability initiatives, and turn incidents and risk analyses into durable improvements that keep our platform resilient as we grow.

This is a remote, Exempt, Full-Time position based in the United States. The successful candidate must have valid work authorization, as visa sponsorship is not available. This full time role reports directly to the Director of Information Systems and Head of Security.

What you'll do

Lead reliability-focused design and readiness reviews for new and existing services, ensuring production readiness, clear rollout and rollback strategies, and strong observability for every launch.
Build, operate, and continuously improve our observability stack (e.g., logging, metrics, tracing) to provide meaningful dashboards, alerts, and runbooks that enable fast, high-quality incident response across engineering teams.
Own and evolve incident management practices, including on-call participation, incident response processes, and post-incident reviews that drive long-term remediation and learning across teams.
Plan and execute disaster recovery exercises and game days to validate our resilience posture, test failover and backup strategies, and systematically reduce single points of failure.
Perform capacity planning and cost optimization for our cloud infrastructure, helping ensure we run a cost-effective environment that meets performance and availability goals as usage grows.
Identify and drive down systemic reliability risks across application, infrastructure, and process layers—owning cross-team projects that significantly reduce incident frequency and severity over time.
Collaborate closely with Developer Experience, Security, and product engineering to embed reliability best practices—testing, rollout patterns, guardrails, and “golden paths”—into shared tools and CI/CD pipelines.
Participate in and help continuously improve the on-call rotation, using real incidents and near-misses to prioritize automation, better alerting, and clearer documentation.

Who you are

Required

5+ years of experience in Site Reliability Engineering, Production Engineering, Infrastructure Engineering, or a closely related role, including hands-on ownership of production systems.
Strong experience operating modern cloud infrastructure, ideally on AWS, including core services for compute, networking, storage, and security primitives.
Proficiency with at least one programming language used at Transcend (e.g., JavaScript, Typescript, or Python), and comfort reading and reviewing application code for reliability and performance concerns.
Hands-on experience with infrastructure-as-code and CI/CD tooling (e.g., Terraform, CloudFormation, or similar; modern build/deploy pipelines) to reliably provision and change infrastructure.
Deep familiarity with observability and monitoring systems (e.g., Datadog or equivalent), including designing alerts that balance coverage and noise to avoid alert fatigue while protecting customer experience.
Proven track record running incident response and post-incident analysis, including root cause identification, clear documentation, and driving follow-through on remediation work.
Excellent communication and collaboration skills, with experience working across multiple engineering teams to align on reliability goals, share context, and influence technical direction without formal authority.
Comfort participating in an on-call rotation, and experience helping to design or improve on-call processes, runbooks, and escalation paths.
Minimum level of education: Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related technical field, or equivalent practical experience.
Demonstrated ability to thrive in a remote-first, high-autonomy environment, managing priorities, communicating asynchronously, and driving projects to completion with limited oversight.

Preferred

Experience working in a high-growth B2B SaaS environment, ideally on security, data, or privacy-focused products.
Experience designing, operating, and tuning Docker-based or serverless architectures on AWS or another major cloud provider.
Familiarity with data privacy regulations and practices (e.g., GDPR, CPRA) and how they inform system design, reliability expectations, and incident response requirements.
Experience defining and rolling out SRE frameworks such as SLOs/SLIs, error budgets, incident management processes, and production-readiness checklists across multiple teams.
Experience working closely with Developer Experience / Platform teams to create paved roads, tooling, and documentation that make it easy for product teams to build reliable services by default.
Relevant technical certifications (e.g., AWS Certified Solutions Architect or DevOps Engineer, CKA/CKAD, or equivalent SRE training) are a plus.

Compensation Information

Our comprehensive compensation packages play a big part in how we recognize you for the impact you have on our path to bringing data rights to everyone.
The compensation pay range represents our reasonable expectation for this role. Individual pay is determined by multiple factors, including, but not limited to, experience, education, skillset, and geographic location.
This specific range applies to Tier 1 labor markets like the SF Bay Area and New York City; it may be adjusted based on the labor market in other geographic areas and the individual qualifications objectively assessed during the interview process.

USA Pay Range

$170,000—$185,000 USD

About Transcend

Transcend is the compliance layer for customer data, enabling enterprises to activate AI responsibly and at scale. We're building the platform that makes data governance, privacy compliance, and AI oversight seamlessly integrated across your entire tech stack.

We are driven by the belief that building robust, accessible infrastructure for responsible AI and data rights is one of the most impactful ways to spend our time. To achieve this, we're assembling an ambitious and passionate team that enjoys tackling important, future-focused problems at the intersection of AI, privacy, and compliance.

We're growing quickly, backed by top-tier investors including Accel, Index, 01A, StepStone Group, and HighlandX, and we are proud to serve some of the world's most iconic brands. Learn more on our Customer Stories Page.

Why Join Us?

Impactful Work: We believe that enabling responsible AI adoption while protecting individual privacy rights is one of the most impactful ways to spend our time. You'll be at the forefront of building the compliance layer that helps enterprises navigate the rapidly evolving landscape of AI regulation, data privacy, and customer data governance.
Autonomy and Growth: You will have the trust and autonomy to drive initiatives from the start. As an early hire in a fast-growing startup defining a new category, you'll have significant opportunities to shape the organization's direction and work on a diverse array of challenging projects.
Dynamic Environment: As the best-in-class solution in an emerging market, Transcend operates at the intersection of AI innovation and compliance. Our product evolves quickly to meet new client needs and adapt to advancing privacy law, AI regulations, and enterprise requirements.
Supportive Culture: The people at Transcend are driven, kind, and know how to balance work, life, and memes. We learn from each other and maintain a strong support system while having fun solving important problems that matter.
Commitment to Diversity and Equal Opportunity: We celebrate a diverse and inclusive workforce and consider all forms of diversity, including race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, and veteran status. Our commitment ensures equitable employment opportunities, non-discrimination in all practices, and a workplace where every employee feels valued and respected. We also consider all qualified applicants with arrest and conviction records, as legally required. If you are enthusiastic about this role but feel your experience doesn't perfectly match every qualification, we strongly encourage you to apply.
Benefits & Perks: Transcend employees enjoy a comprehensive benefits program that includes flexible PTO, parental leave, a 401(k) match, and competitive compensation packages that include employee equity. Learn more about our offerings here.

By applying for this position, your data will be processed per Transcend's Privacy Policy.

Originally posted on Himalayas

Senior Site Reliability Engineer