C

Software Engineer- Kubernetes

Cadre5
On-site
Knoxville, Tennessee, United States
                                                                                                                                    Software Engineer- Kubernetes

Founded in 1999 in the beautiful Smoky Mountains of East Tennessee, Cadre5 provides innovative technical solutions to customers locally and nationally. Our Cadre5 Lab Partners division has partnered with the National Center for Computational Sciences (NCCS) at Oak Ridge National Laboratory (ORNL) to recruit Kubernetes Engineers for the American Science Cloud (AmSC) initiative. AmSC is a first-of-its-kind, federally funded cloud infrastructure and API platform designed to accelerate AI model development, data sharing, and large-scale computational science across the U.S. Department of Energy (DOE). ORNL is a premier research institution delivering breakthroughs in energy, national security, and advanced computing. Located near Knoxville, TN, the lab provides world-class resources to solve some of the nation’s most complex scientific challenges. This is a rare opportunity to be part of a groundbreaking project that will help shape the future of U.S. scientific computing. If you’re passionate about cloud engineering, DevOps, and enabling large-scale science, we’d love to hear from you!

**Please note: The first step in the interview process requires candidates to join a Microsoft Teams meeting with the video turned on.**

This is a full-time position that can telecommute. Occasional travel to the Oak Ridge facility may be required.

Why Cadre5?
  • Working with highly talented team members
  • 3 weeks’ vacation
  • Excellent medical insurance, including employer-paid benefits  

The Project:

The American Science Cloud (AmSC) will deliver secure, containerized workflows, GPU-enabled training environments, and integration with DOE’s high-performance computing facilities (ALCF, OLCF, NERSC, ESnet, HPDF). This collaborative effort will bring together cloud engineering, HPC, cybersecurity, data science, and program management expertise to build the next generation of scientific computing infrastructure.

We are seeking Software Engineers with deep Kubernetes expertise to design and develop custom Kubernetes Operators that extend the orchestration of high-performance workloads and secure data workflows at scale. These roles are central to enabling AmSC’s AI and HPC platforms, ensuring that containerized research applications run seamlessly across heterogeneous compute and data environments.

Key Responsibilities:
 
  • Custom Kubernetes operator development
  • Design, implement, maintain, modify, and test custom Kubernetes operators written in Go and/or Ansible
  • Enhance existing software development processes, practices, and standards. test environments to evaluate tooling based on performance, feature set, and maintainability—especially for components that must work reliably with on-premise hardware and OS requirements.
  • Support the use and understanding of in-house Kubernetes operators and serve as a maintainer for those controllers.
  • Architecture & Infrastructure as Code and Tooling
  • Develop and implement an Architecture as Code process for the Slate platform
  • Write and maintain infrastructure and deployment code using tools such as ArgoCD (GitOps), Puppet (OS management), Go, Python, Bash, Ansible, Terraform, and GitLab CI.
  • Engage with development teams to understand platform needs and tailor the cluster experience to meet evolving requirements.
  • Technical Leadership for Software Engineering
  • Provide software development, guidance, code reviews, and pair programming support to a team of 11 engineers.
  • Contribute to onboarding, team documentation, and process improvement initiatives.
  • Act as a go-to technical expert for all Kubernetes custom operator questions across the engineering organization.
  • Collaboration
  • Partner closely with internal cybersecurity and development teams to ensure the platform custom operators meets security, compliance, and usability expectations.
  • Participate in cross-functional projects related to platform enhancements, cluster lifecycle automation and infrastructure provisioning.

Basic Qualifications:
 
  • Experience with the following key technologies and tools:
  • Languages: Go, Python, Bash
  • CI/CD: GitLab CI, ArgoCD
  • IaC/Config Management: Puppet, Helm, Ansible
  • Kubernetes & Ecosystem: On-prem K8s, Custom Operators, Service Mesh, k8s architecture
  • Operating Systems: Linux-based OS management at the hardware level, strong Linux sysadmin skills
  • The ability to obtain and maintain a Department of Energy "Q" clearance may be required. This requires US Citizenship.

Preferred Qualifications:
  • Prior Istio operator development or service mesh integration experience.
  • Familiarity with WebAssembly plugin development for Istio or Kubernetes.
  • Background in HPC platforms, GPU-based AI training environments, or large-scale distributed systems.
  • Exposure to DOE computing ecosystems (ALCF, OLCF, NERSC, ESnet, HPDF).
  • Experience with containerized scientific workflows and secure data-sharing architectures.

Benefits

Cadre5 offers excellent pay and benefits, to include full medical, dental, and vision coverage coupled with 401K match, 15 days PTO, and 10 holidays.
 Cadre5 is an equal opportunity employer.  All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply.  Cadre5 is an E-Verify Employer.

Apply now
Share this job