Infrastructure & IT
Managing internal technology systems, networks, devices, and tooling. Covers IT management, systems administration, network engineering, internal IT support/help desk, cloud infrastructure (internal), database administration, IAM, enterprise architecture, and SaaS management.
Roles
The canonical roles within Infrastructure & IT.
Systems Engineer
Systems Engineers in AI companies design and operate the enterprise technology platforms that enable researchers and product teams to work efficiently—managing identity systems like Okta, collaboration tools such as Google Workspace and Slack, and endpoint infrastructure while ensuring security and scalability. What distinguishes this role from general IT administration is the emphasis on automation-first problem solving: rather than simply maintaining systems, these engineers architect scalable workflows using APIs, infrastructure-as-code, and integration platforms to eliminate manual processes and reduce operational friction. They typically sit within IT Engineering or Enterprise Systems teams, partnering closely with Security and Infrastructure groups to support rapid company growth, and increasingly they're being asked to bridge traditional IT operations with emerging AI workflows and autonomous systems.
IT Leadership
IT leaders in AI companies manage the internal technology infrastructure that enables engineering teams and operations to function at scale. They oversee corporate systems—identity, endpoints, cloud infrastructure, collaboration tools, and networking—while maintaining enterprise security standards and compliance requirements. What distinguishes this role from other operations leadership is the emphasis on treating infrastructure as code with measurable SLOs and automated remediation, rather than reactive troubleshooting, and the need to support highly technical, impatient engineering populations. These leaders typically sit within larger organizations building AI infrastructure or AI products, partnering closely with security, networking, and engineering teams to balance rapid innovation with operational maturity and governance.
Business Applications Administrator
Administrators in this role configure, maintain, and optimize business-critical SaaS platforms—from HR systems like Workday and HiBob to financial platforms like NetSuite and Coupa, as well as collaboration tools and support systems. They spend their days troubleshooting user issues, managing system integrations, designing workflows that scale across global operations, and ensuring data accuracy and compliance as the company grows. What sets this role apart is the strategic ownership of entire system landscapes rather than single-tool support; these professionals act as trusted partners to finance, HR, and operations teams, translating complex business needs into system configurations while balancing tactical maintenance with roadmap planning. They typically sit within centralized IT or Operations teams in high-growth AI and enterprise software companies, where rapid scaling demands reliable, automated, and compliant systems infrastructure.
Data Center IT Technician
This role involves hands-on troubleshooting and maintenance of high-performance GPU infrastructure and server hardware in AI-scale data centers. Technicians diagnose and resolve complex hardware incidents, manage fiber and network connectivity, and ensure continuous uptime of critical systems supporting large-scale AI model training and inference workloads. They work in shift-based operations within distributed data center teams, collaborating with L3 engineers and infrastructure specialists to optimize system reliability and reduce mean time to repair—directly impacting the performance of AI clusters that power customer applications.
IT Support Specialist
IT Support Specialists at AI companies serve as the frontline troubleshooters for hardware, software, and connectivity issues across distributed teams building large language models and AI infrastructure. These roles distinguish themselves through deeper technical ownership—specialists often automate repetitive support workflows using Python scripting and API orchestration platforms like Okta Workflows, manage device fleets at scale across multiple regions, and ensure compliance frameworks like SOC2 are operationalized in practice. They typically sit within lean IT operations teams that report to infrastructure leadership and collaborate closely with security and people operations to balance user productivity with governance requirements as the company scales.
Infrastructure Engineer
Infrastructure Engineers in this role design and operate cloud platforms that power AI workloads, translating technical standards into production systems that are secure, observable, and resilient by default. They distinguish themselves from pure IT operations roles by owning end-to-end infrastructure architecture decisions—from Kubernetes and storage systems to identity and networking—while maintaining the operational discipline to troubleshoot incidents and tune performance at scale. These engineers typically sit within platform or infrastructure teams at AI-focused companies, partnering closely with data science, product, and SRE teams to ensure that GPU clusters, distributed training pipelines, and high-performance data systems remain reliable as the organization scales. Their work bridges the gap between strategic infrastructure choices and everyday operational reality, implementing infrastructure-as-code patterns and CI/CD automation that let product teams ship safely without trading off uptime or security.
Network Engineer
Network Engineers in AI companies design and operate the critical infrastructure that keeps training clusters, inference pipelines, and global services running at scale. Day-to-day work spans configuring high-performance backbone networks, automating deployments across distributed data centers, and troubleshooting latency-sensitive systems that power large language models and generative AI products. What distinguishes this role from traditional IT networking is the focus on extreme scale—managing 100k+ GPU clusters, optimizing RoCEv2 and NCCL performance, and balancing hyper-growth infrastructure needs with automation-first operations using Python and Ansible. These engineers typically sit within infrastructure or platform teams that report to VP-level engineering leadership, working closely with ML systems engineers, cloud platform teams, and security groups to ensure networks meet both performance and compliance requirements for real-time AI inference and training workloads.
Security Infrastructure Engineer
This role designs, builds, and operates identity and access management systems that scale across cloud infrastructure, SaaS platforms, and internal services at AI companies. Engineers here balance automation with compliance, implementing SSO consolidation, RBAC models, and lifecycle management while reducing access sprawl and supporting rapid business growth. They work at the intersection of security governance and operational efficiency, partnering with infrastructure, IT, and compliance teams to embed least-privilege access into AI development workflows and multi-cloud environments. The role sits within security or infrastructure teams and demands expertise in identity platforms like Okta, cloud IAM services, and scripting automation to protect critical assets while enabling researchers and engineers to move quickly.
Recent Jobs
The latest Infrastructure & IT openings across the AI industry.