Senior Kubernetes Platforms System Engineer
Founded in 1999 in the beautiful Smoky Mountains of East Tennessee, Cadre5 provides innovative technical solutions to our customers locally and nationally. Our Cadre5 Lab Partners division has partnered with The National Center for Computational Sciences (NCCS) at Oak Ridge National Laboratory (ORNL) to recruit a Senior Kubernetes Platforms System Engineer, you will work in the Infrastructure team within the HPC Infrastructure and Networking group to support all activities of our supercomputer center.
ORNL delivers scientific discoveries and technical breakthroughs needed to realize solutions in energy and national security and provides economic benefit to the nation. This premier research institution located near Knoxville in Oak Ridge, TN, addresses national needs through impactful research and world-leading research centers.
**Please note: The first step in the interview process requires candidates to join a Microsoft Teams meeting with the video turned on.**
This is a full-time, permanent position that can telecommute. Occasional travel to the Oak Ridge facility may be required.
Why Cadre5?
- Working with highly talented team members
- 3 weeks’ vacation
- Excellent medical insurance, including employer-paid benefits
Job Responsibilities:
- Work with the team to define and implement best practices and standards within the organization
- Keeping the Kubernetes platform reliable, available, and fast
- Architecting solutions to problems that improve the reliability, scalability, performance, and efficiency of our services
- Respond to, investigate, and fix service issues all the way from bare metal through the OS to the application code
- Coordinate with vendors to resolve hardware and software problems
- Participate in an on-call rotation providing 24-hour, 7-day support and off-hours maintenance windows
- Work with users to help them use Kubernetes
Basic Qualifications:
- Bachelor’s degree in a scientific field and a minimum of 8 years of relevant experience. An equivalent combination of education and experience will be considered.
- Excellent interpersonal/communications skills, and the ability to work as part of a team
- Experience with Kubernetes
- Experience with Red Hat OpenShift, OpenShift Data Foundations, Advanced Cluster Management for Kubernetes, and Advanced Cluster Security for Kubernetes
- Experiencing with managing image registries such as Quay or Harbor
- Solid understanding of networked computing environment concepts
- Strong working knowledge of Unix systems fundamentals and common network protocols
- Ability to develop and maintain programs and scripts that aid in the operation and automation of tasks using various shell and scripting languages (primarily bash, Python, and Go)
- Ability to identify requirements and to define, plan, and implement requisite solutions
- Experience using tools such as Prometheus, Nagios, and Grafana to monitor systems, metrics and create dashboards
- Experience designing and implementing highly-available systems/services
- Experience with Infrastructure-as-Code tooling such as Terraform, Helm, and Puppet
- Experience with CI/CD tooling and GitOps
- Experience with code review and familiarity with tools like git, GitHub and GitLab
- Experience implementing systems-level security technologies like SELinux and following security best practices
- The ability to obtain and maintain a Department of Energy "Q" clearance is required. This requires US Citizenship.
Benefits
Cadre5 offers excellent pay and benefits, to include full medical, dental, and vision coverage coupled with 401K match, 15 days PTO, and 10 holidays.
Cadre5 is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply. Cadre5 is an E-Verify Employer.
|