Infrastructure Engineer
Infrastructure Engineers in this role design and operate cloud platforms that power AI workloads, translating technical standards into production systems that are secure, observable, and resilient by default. They distinguish themselves from pure IT operations roles by owning end-to-end infrastructure architecture decisions—from Kubernetes and storage systems to identity and networking—while maintaining the operational discipline to troubleshoot incidents and tune performance at scale. These engineers typically sit within platform or infrastructure teams at AI-focused companies, partnering closely with data science, product, and SRE teams to ensure that GPU clusters, distributed training pipelines, and high-performance data systems remain reliable as the organization scales. Their work bridges the gap between strategic infrastructure choices and everyday operational reality, implementing infrastructure-as-code patterns and CI/CD automation that let product teams ship safely without trading off uptime or security.
Skills
What companies are looking for in this role.
Diagnosing and troubleshooting complex hardware failures across server, storage, and GPU systems
Managing end-to-end lifecycle of storage and infrastructure systems from deployment to optimization
Performing system performance analysis, benchmarking, and optimization across diverse workloads
Conducting root cause analysis and implementing systemic failure prevention measures
Executing firmware and BIOS upgrades and component-level repairs on enterprise hardware
Creating and maintaining technical documentation, standard operating procedures, and runbooks
Managing vendor relationships, RFPs, and warranty replacement processes with OEM partners
Managing hardware problem resolution workflows and escalation processes
Designing and executing proof of concept exercises to validate new technologies and solutions
Architecting and operating large-scale distributed storage systems for high-performance workloads
Implementing infrastructure-as-code and automation solutions for operational efficiency
Defining technical strategy and architecture decisions for multi-region and globally distributed systems
Optimizing infrastructure for emerging AI and machine learning workload patterns
Influencing vendor product roadmaps through technical feedback and strategic partnerships
Supporting sustainable and energy-efficient data center operations
Building relationships with cross-functional teams and influencing without direct authority
Participating in on-call schedules and responding to production incidents with urgency
Taking ownership and demonstrating initiative in problem-solving and process improvement
Operating effectively in ambiguous environments and making informed decisions with incomplete information
Communicating complex technical concepts to both technical and non-technical stakeholders
Technology
The tools and technologies that define this role.
Open Jobs
20 open Infrastructure Engineer jobs across 9 companies.
Other Infrastructure & IT roles
Provides end-user technical support including hardware, software, and account troubleshooting.
Designs, deploys, and maintains enterprise IT systems including identity management, SaaS platforms, device management, and business applications. The IT-facing systems engineer managing corporate technology.
Designs, implements, and maintains network infrastructure including LAN, WAN, backbone, and edge networks.
IT professionals who remotely manage servers, operating systems, hypervisors, and software within data center environments. Focuses on systems administration, monitoring, patching, and troubleshooting at the OS and application layer — NOT physical hardware installation.
Implements and manages security infrastructure including IAM, endpoint security, SIEM, and security tooling. Operates within IT or infrastructure teams to protect the corporate environment.