728 x 90



Staff Software Engineer, AI Reliability Engineering

London

£255,000 - £390,000 per year

Posted 2 weeks ago
  • Company

    Anthropic
  • Location

    London
  • Company Size

    501-1,000 employees
  • Salary

    £255,000 - £390,000 per year

About the job

Anthropic is seeking a skilled Staff Software Engineer to join their AI Reliability Engineering team in London. This role is crucial to defining and achieving reliability metrics for Anthropic’s internal and external products, focusing on large-scale AI infrastructure. The engineer will develop service level objectives (SLOs) for language model serving and training systems, balancing performance metrics like availability and latency with development velocity. Key responsibilities include designing and implementing monitoring systems for high-availability language model serving infrastructure, developing automated failover and recovery systems, and leading incident response for critical AI services. The role also involves building and maintaining cost optimization systems, ensuring efficient use of AI hardware accelerators such as GPUs and TPUs. This position plays a critical part in ensuring Anthropic’s AI systems remain reliable, scalable, and beneficial while optimizing performance and cost efficiency.


Apply For this Job