Caltech is a world-renowned science and engineering institute that marshals some of the world's brightest minds and most innovative tools to address fundamental scientific questions. We thrive on finding and cultivating talented people who are passionate about what they do. Join us and be a part of the diverse Caltech community.
Job Summary
As High-Performance Computing (HPC) Systems Senior Administrator you will architect, deploy, administer, and update large-scale research systems, related infrastructure services, grid software stacks, and operating systems. You will also be responsible for the components of Caltech's computational research environment, and work closely with researchers, systems administrators, engineers and developers throughout the Institute and partner institutions. This individual will also consult with multiple IT colleagues, external research groups, and grant-funded initiatives. Additionally, you will administer existing cluster and grid infrastructure technologies, and research/prototype new systems and technologies. Finally, you will participate in national computing activities by attending workshops, conferences, and potentially presenting research.
Essential Job Duties
- Contribute to the evolution of Caltech's high-performance computing infrastructure design that leverages Cloud and HPC Technologies.
- Contribute to technical systems management, administration, and support for the on-premises and cloud-based high-performance computing (HPC) cluster environments. This includes all configuration, authentication, networking, storage, interconnects, and software usage, and installation of HPC Cluster(s).
- Responsible for installing/configuring/patching/upgrading software, and tuning, optimizing, proactively monitoring, and securing services.
- Deploy, troubleshoot, and maintain Linux systems in a scientific or research computing environment, and contribute to the HPC Team on best practices and carry out documentation of procedures and processes.
Basic Qualifications
- 3+ years of experience deploying and managing HPC applications and services.
- Experience with system management frameworks (e.g., foreman, puppet, salt).
- Experience with programming in at least one of the following: Perl, Python, or UNIX shell.
- Familiarity with high-performance interconnects (e.g., RDMA, high-speed Ethernet, Infiniband), high performance storage, and/or distributed storage systems.
- Extensive understanding of UNIX-based operating systems.
- Proficiency in systems administration and automation, TCP/IP networking, and system troubleshooting.
- Familiarity with modern large-scale scientific computing systems.
- Experience with capacity planning for large scale systems.
- Ability to work in a collaborative, team-based environment.
- Excellent troubleshooting, debugging, and diagnostic skills.
- Strong networking knowledge and skills.
- Experience working on technical proposals, and supporting active research projects.
- Experience with Linux cluster resource allocation, job scheduling, InfiniBand networks, MPI communications, and cluster monitoring.
- Familiarity with Slurm job scheduling software including installation, maintenance, and usage.
- Familiarity with building and installing environment modules.
- Familiarity with Linux kernel internals, computation accelerators (e.g., GPU computing, CUDA), MPI, and OpenMP.
- Highly resourceful and adept at juggling multiple simultaneous projects.
- Must demonstrate ability to work effectively on independent, self-directed projects.
- Strong written and verbal communication skills, and a desire to learn new technology and techniques.
Preferred Qualifications
- Vast Data storage platform.
- IBM GPFS ESS storage platform.
- Understanding of the academia workplace and culture.
- AWS and Google Cloud experience.
Required Documents
Hiring Range
$147,700 to $161,500 per year
The salary of the finalist(s) selected for this role will be set based on a variety of factors, including but not limited to, internal equity, experience, education, specialty and training.
As one of the largest employers in Pasadena, CA, Caltech is committed to providing comprehensive benefits to eligible employees and their eligible dependents. Our benefits package includes competitive compensation, health, dental, and vision insurance, retirement savings plans, generous paid time off (vacation, holidays, sick time, parental leave, bereavement, etc.), tuition reimbursement, and more. Non-benefit eligible employees will have access to some benefits such as onsite counseling and sick time. Learn more about our benefits and staff perks.
EEO Statement
We are an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to age, race, color, religion, sex, sexual orientation, gender identity, or national origin, disability status, protected veteran status, or any other characteristic protected by law.
Caltech is a VEVRAA Federal Contractor. To read more Equal Employment Opportunity (EEO) go to eeoc_self_print_poster.pdf.
Disability Accommodations
If you would like to request an accommodation in completing this application, interviewing, or otherwise participating in the employee selection process, please direct your inquiries to Caltech Recruiting at employment@caltech.edu.
|