C

Software Engineer- Kubernetes

Cadre5
On-site
Knoxville, Tennessee, United States

Software Engineer- Kubernetes 

 
Founded in 1999 in the beautiful Smoky Mountains of East Tennessee, Cadre5 provides innovative technical solutions to customers locally and nationally. Our Cadre5 Lab Partners division has partnered with the National Center for Computational Sciences (NCCS) at Oak Ridge National Laboratory (ORNL) to recruit Cloud Engineers for the American Science Cloud (AmSC) initiative.
 
AmSC is a first-of-its-kind, federally funded cloud infrastructure and API platform designed to accelerate AI model development, data sharing, and large-scale computational science across the U.S. Department of Energy (DOE).
 
ORNL is a premier research institution delivering breakthroughs in energy, national security, and advanced computing. Located near Knoxville, TN, the lab provides world-class resources to solve some of the nation’s most complex scientific challenges.
 
We are recruiting multiple Cloud Engineers with complementary skill sets; some focused on AWS cloud infrastructure and others on DevOps/MLOps and application deployment.
 
This is a rare opportunity to be part of a groundbreaking project that will help shape the future of U.S. scientific computing. If you’re passionate about cloud engineering, DevOps, and enabling large-scale science, we’d love to hear from you!
 
**Please note: The first step in the interview process requires candidates to join a Microsoft Teams meeting with the video turned on.**
 
This is a full-time position that can telecommute. Occasional travel to the Oak Ridge facility may be required.
 

Why Cadre5?

  • Working with highly talented team members
  • 3 weeks’ vacation
  • Excellent medical insurance, up to 100% paid by employer
 

The Project:

 
The American Science Cloud (AmSC) will deliver secure, containerized workflows, GPU-enabled training environments, and integration with DOE’s high-performance computing facilities (ALCF, OLCF, NERSC, ESnet, HPDF). This collaborative effort will bring together cloud engineering, HPC, cybersecurity, data science, and program management expertise to build the next generation of scientific computing infrastructure.
 
We are seeking Software Engineers with deep Kubernetes expertise to design and develop custom Kubernetes Operators that extend the orchestration of high-performance workloads and secure data workflows at scale. These roles are central to enabling AmSC’s AI and HPC platforms, ensuring that containerized research applications run seamlessly across heterogeneous compute and data environments.
 

Key Responsibilities:

 
  • Architect, implement, and maintain custom Kubernetes Operators for HPC, AI model training, and data-sharing workflows.
  • Build and extend REST-based APIs (GraphQL experience preferred) for integration with scientific applications and DOE facilities.
  • Develop operator logic using Go and the controller-runtime library, implementing efficient reconciliation loops.
  • Manage Custom Resource Definitions (CRDs) and admission webhooks to enforce policy, security, and resource lifecycle automation.
  • Package, deploy, and validate operators using Operator SDK and/or Kubebuilder.
  • Collaborate with cross-functional teams (cloud, HPC, cybersecurity, and data science) to integrate operators with GPU-heavy environments and containerized AI workflows.
  • Implement observability and telemetry for operators (Prometheus, OpenTelemetry, Grafana) to ensure performance, reliability, and debugging.
  • Work with service meshes, particularly Istio, to simplify and secure operator-managed services.
  • Research and apply WebAssembly (WASM) plugin development for advanced extensions in Istio.
 

Basic Qualifications:

 
  • Strong understanding of Kubernetes internals (API server, controllers, scheduler, reconciliation loop).
  • Proven experience developing Kubernetes Operators with Operator SDK and/or Kubebuilder.
  • Proficiency in Go programming language.
  • Experience with CRDs, RBAC, and admission controllers.
  • Demonstrated background in API development (REST essential; GraphQL nice-to-have).
  • Strong Git-based software development practices and testing experience (unit, integration, e2e).
  • The ability to obtain and maintain a Department of Energy "Q" clearance is required. This requires US Citizenship.
 

Preferred Qualifications:

 
  • Prior Istio operator development or service mesh integration experience.
  • Familiarity with WebAssembly plugin development for Istio or Kubernetes.
  • Background in HPC platforms, GPU-based AI training environments, or large-scale distributed systems.
  • Exposure to DOE computing ecosystems (ALCF, OLCF, NERSC, ESnet, HPDF).
  • Experience with containerized scientific workflows and secure data-sharing architectures.
 

Benefits

 
Cadre5 offers excellent pay and benefits, to include full medical, dental, and vision coverage coupled with 401K match, 15 days PTO, and 10 holidays.
Cadre5 is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply. Cadre5 is an E-Verify Employer.