Site Reliability Engineer

Design, build, and operate a world-class real-time streaming cloud platform

We are building a real-time streaming engine for modern applications - from the enterprise to the solo dev prototyping a react application on her laptop. We go beyond the Kafka protocol, into the future of streaming with inline WASM transforms and geo-replicated hierarchical storage. A new platform that scales with you from the smallest projects to petabytes of data distributed across the globe.

We are on a mission to enable every developer to supercharge their real-time applications and we are looking for engineers to help bring our SaaS offering to customers across the globe.

You Will

You will be a part of our cloud team, working with all of engineering on building new services, automating infrastructure lifecycle on Kubernetes, and monitoring our services with the goal of offering a reliable, scalable and high-performance SaaS.

  • Build & design Vectorized’s cloud infrastructure with reliability and performance in mind

  • Build tools & services to allow automated infrastructure management and self-healing

  • Be in charge of end-to-end monitoring of our cloud

  • Participate in on-call rotations, working to keep customer workloads running and incident free

You Have

  • 3+ years of experience in an SRE-like role

  • Comfortable working with a 100% distributed engineering team, collaborating on GitHub, in the open

  • Strong experience with public cloud providers

  • Experience running highly-scalable production workloads reliably on Kubernetes

  • Experience with monitoring at scale

  • Experience managing infrastructure predictably through GitOps and IaC

  • Solid programming skills

  • Willingness to participate in an on-call rotation

  • Excellent written communication skills

  • A BS in Computer Science or equivalent experience

Nice to have

  • Strong understanding of Go and Kubernetes

  • Experience operating a SaaS platform

  • Fluency in a couple of programming languages (for example, Go or Python)

  • Operated and used streaming platforms either as a user or provider

  • Experience with the Prometheus monitoring stack

Apply for this job