GTRI/ICL seeks qualified candidates for a full-time Senior Systems Operations Engineer in Atlanta, GA. The successful candidate will have 6+ years of experience in monitoring, maintaining, and administering deployed cloud-based enterprise system components. The successful candidate will also be responsible for maintaining system reliability and availability of all deployed system components, including back-up and disaster recovery procedures. In addition to excellent technical qualifications, the ideal candidate will have excellent collaboration skills, a passion for lean processes, will be innovative, creative, quick thinker, fast learner, and a team player with a strong customer service focus. The project is based in the Centergy Building at 75 Fifth Street NW Atlanta, on the Georgia Tech campus.
• Use and enhance Site Reliability Engineering processes to manage application and infrastructure changes
Maintain and ensure reliability of an AWS cloud infrastructure and services to meet operational requirements. Maintain the principle of least privilege across all aspects of the cloud environment.
• Administer and enhance the cloud-based extended network including transit containerization, VPCs, firewalls, load balancers, proxies, connectivity with clients and third-parties, and internal zone-to-zone communications. Focus on security, performance, high availability and disaster recovery.
• Act as a senior-level escalation point for production system monitoring, support and incident response. • Develop technical improvements and fixes by leading collaboration with internal teams and vendors.
• Work with team members to ensure processes are fully documented by maintaining technical references including articles, diagrams, spreadsheets, and checklists.
• Manage the response to incidents and use a variety of incident management tools, communications, and facilitation methods to ensure issues are identified and the right stakeholders are involved in resolution.
• Coordinate and supervise cross-functional teams throughout the entire incident management process.
• Conduct after-action reviews after an incident is closed to ensure key lessons-learned are gathered and mitigations put in place.
Education & Length of Experience
Research Engineer/Scientist/Technologist/Associate II
- A Master’s degree in Computer Science or Engineering, or related technical field and three (3) years of relevant full-time experience after completion of that degree,
- A Master’s degree in Computer Science or Engineering, or related technical field and five (5) years of relevant full-time experience after completion of a Bachelor’s degree, or
- A Doctoral degree in Computer Science or Engineering, or related technical field.
Senior Research Engineer/Scientist/Technologist/Associate
- A Master’s degree in Computer Science or Engineering, or related technical field and seven (7) years of relevant full-time experience after completion of that degree,
- A Master’s degree in Computer Science or Engineering, or related technical field and nine (9) years of relevant full-time experience after completion of a Bachelor’s degree, or
- A Doctoral degree in Computer Science or Engineering, or related technical field and four (4) years of relevant full-time experience after completion of a Bachelor's degree.
Required Minimum Qualifications
- Experience with AWS services like Cloudwatch, CloudTrail, EC2, IAM, etc.
- Able to support CI/CD deployments • Experience in Incident management • Disaster Recovery experience
- Experience with backups and recovery of data and systems
- Experience in reporting various aspects of operations SLAs
- Understanding of system monitoring • Understanding of scalability and reliability aspects of systems
- Ability to triage issues
- Work independently as well as part of a project team, with multiple communication channels
- Ability to create and maintain highest levels of confidentiality when dealing with proprietary or private information.
- Ability to exercise sound judgment and make decisions in a manner consistent with the essential job function
- Excellent communication skills including meeting facilitation, technical writing, and conveying complex technical information in common language, appropriate for all levels of technological understanding.
- Experience with ServiceNow Incident Management processes
- Ability to analyze documentation on relevant regulatory programs (either from the federal register or summary documentation)
- Background in Healthcare
U.S. Citizenship Requirements
Due to our research contracts with the US federal government, candidates for this position are required to be US Citizens.
Clearance Type Required
Ability to obtain Secret Clearance upon hire
Diversity & Inclusion
Diversity & Inclusion (D&I) at Georgia Tech Research Institute aims to enhance the Institute’s mission of solving the world’s most complex technical problems by creating a workforce with a shared appreciation for diversity, raising awareness around inclusiveness, and fostering a sense of belonging and appreciation for all members of our community.
Equal Employment Opportunity
Georgia Tech Research Institute is an Equal Opportunity Employer of individuals with disabilities and protected veterans and actively seeks diversity among its employees. Equal Employment Opportunity is the Law.