Site Reliability Engineer at Chipcolate
Full-Stack Programming
Worldwide
$75k-$100k
Full-Time
Devops
Sql
PostgreSQL
Docker
Infrastructure Orchestration
Join Our Team: Site Reliability Engineer (SRE)
Chipcolate seeks a skilled Site Reliability Engineer (SRE) to ensure the stability, scalability, and performance of our agent-driven financial platform. You'll build and automate resilient cloud infrastructure, optimize latency and uptime, and empower our product team with the tools to deploy confidently at scale.
- Salary: €65,000 – €85,000 gross/year
- Location: Fully remote (±4 hours CET)
- Employment type: Full-time, permanent
Who We Are
Chipcolate is a tight-knit Italian team of engineer-craftsmen dedicated to building elegant, robust software. Our expertise spans embedded systems, web applications, and 3D printing. Currently, we're developing high-performance financial services for autonomous agents, working in a dynamic, flexible environment that prioritizes speed and quality.
- Mission: Ensure our customer’s success from day one
- Learn more: chipcolate.com · LinkedIn (@chipcolate) · GitHub (github.com/chipcolate)
Your Role
- Design and maintain a distributed, multi-cloud infrastructure for high availability and low latency.
- Build scalable solutions to support thousands of concurrent agents.
- Optimize Postgres performance, including OLAP workloads.
- Develop observability stacks (Grafana/OpenTelemetry) with meaningful SLOs.
- Automate reliability measures: blue/green deployments, canary releases, chaos testing.
- Collaborate with backend teams to enhance service performance and scalability.
- Implement cost-efficient capacity planning and security best practices.
What You Bring
Essential:
- 3+ years in SRE, DevOps, or Production Engineering
- Expertise in Linux, containers, and Postgres
- Proficiency in Node, Python, Go, or Rust
- Infrastructure-as-Code (Ansible, Terraform)
- Strong monitoring/alerting skills (RED/USE metrics)
Bonus:
- Grafana observability experience
- Agent-based or event-driven architectures
- Multi-region, active-active setups
- Supabase or DuckDB familiarity
- Kubernetes at scale
No degree required—prove your expertise through real-world system resilience.
Why Join Us?
- Remote-first, flexible hours
- High-growth, innovative domain
- 20 days PTO + local holidays
- Competitive compensation
How to Apply
- Submit your CV/GitHub and a brief note on your most impactful incident resolution.
- Technical assessment.
- Interview with our CTO (system design + problem-solving).
- Offer within 7 business days or final discussion.
Ready to engineer reliability into high-stakes systems? Apply now and let’s build the future together.