Senior Database Reliability Engineer
Lattice’s Engineering team is continuously working to better both our product and our craft. We use a modern, cutting-edge tech stack and love experimenting with new technologies. We strive for maintainable, robust, and performant code. We’re highly collaborative and continuously iterative and work closely with designers and product managers. We prioritize not only great technical architecture but also an amazing product experience.
Lattice is hiring a reliability engineer to focus on our PostgreSQL databases, to ensure high reliability and performance. We run in Amazon Web Services, using both RDS and Aurora instances of PostgreSQL. We’re looking for someone who is comfortable configuring and tuning database clusters as well as partnering with engineering teams to review and improve data modeling and query optimization. You’re also a software developer, and will build tools, libraries, and other code to improve Lattice’s use of our databases. You’ll use Terraform for provisioning and managing the database infrastructure, and Datadog and other tools for database observability.
What You Will Do
- Work on database reliability and performance as a member of the SRE team.
- Analyze solutions and implement best practices for operating our PostgreSQL databases.
- Work on observability of relevant database metrics and make sure we reach our database objectives.
- Work with other reliability engineers to roll out changes to our production environment and help mitigate database-related production incidents.
- Participate in on-call support rotation with the team.
- Provide database expertise to engineering teams (for example through reviews of database migrations, queries and performance optimizations).
- Work on automation of database infrastructure and help engineering succeed by providing self-service tools.
- Plan the growth and manage the capacity of Lattice's database infrastructure.
- Support and debug database production issues across services and levels of the stack.
- Document every action so your learnings turn into repeatable actions and then into automation.
- Cross-train other reliability engineers on aspects of database reliability.
What You Will Bring To The Table
Experience it’s important for you to have at some level:
- At least 5 years of experience running PostgreSQL in large production environments.
- Experience operating systems in cloud environments such as AWS or GCP.
- Have deep knowledge of SQL and data modeling for RDBMS.
- Have good knowledge of the internals of PostgreSQL.
- Experience deploying/utilizing proxy and optimization solutions such as RDS Proxy pgBouncer, PGAnalyze, OtterTune, etc.
- Have several years of experience programming in a software engineering role.
- Understanding of SRE concepts such as SLA/SLI/SLOs and incident management processes.
- Strong desire to automate away the toil.
- Strong interest in collaborating with and mentoring product engineers about SQL and database topics.
Experience That Would Be Helpful
- Experience with infrastructure automation and configuration management using tools like Terraform, Chef, Ansible, Puppet, etc.
- Experience with observability tooling for database monitoring and troubleshooting, such as Datadog, Percona, EverSQL, etc.
- Familiarity with distributed systems and networking concepts as they apply to applications and database utilization.