Company
AnthropicLocation
LondonCompany Size
501-1,000 employeesSalary
£255,000 - £390,000 per yearAbout the job
Anthropic is seeking a skilled Staff Software Engineer to join their AI Reliability Engineering team in London. This role is crucial to defining and achieving reliability metrics for Anthropic’s internal and external products, focusing on large-scale AI infrastructure. The engineer will develop service level objectives (SLOs) for language model serving and training systems, balancing performance metrics like availability and latency with development velocity. Key responsibilities include designing and implementing monitoring systems for high-availability language model serving infrastructure, developing automated failover and recovery systems, and leading incident response for critical AI services. The role also involves building and maintaining cost optimization systems, ensuring efficient use of AI hardware accelerators such as GPUs and TPUs. This position plays a critical part in ensuring Anthropic’s AI systems remain reliable, scalable, and beneficial while optimizing performance and cost efficiency.
Apply For this Job