JOB DESCRIPTION Description The Resiliency Engineer engages in all facets and activities of the Resiliency program with specific focus on developing a push button DR strategy, automated infrastructure component failover, integration with application start up and shutdown, and end to end failover. They will be responsible for recovery plan development, testing and compliance. Solid understanding of infrastructure automation and recovery capabilities is required. The Senior Resiliency Engineer will coordinate and participate in regular DR activities supporting various business capabilities. Responsibilities The Senior Resiliency Engineer engages in all facets and activities of the Resiliency Program with specific focus on: • Establishing and advocating Resiliency Program strategy while executing activities with IT teams. • Push Button DR automation of applications and business capabilities. • Implementing policies and procedures for disaster recovery planning. • Managing recovery plan development and maintenance activities to ensure plans are current and compliant • Managing the test lifecycle supporting and facilitating testing with infrastructure and applications teams in alignment with DR strategy • Highly experienced in engaging infrastructure and application teams to collaborate on discussions and requirements for disaster recovery. • Solid understanding of IT infrastructure, application and operations environment structure • Able to work successfully as a member of a team through collaboration and support. • Proven ability to influence and collaborate with internal and external resources. • Makes decisions on moderately complex to complex issues, ability to solve problems and address obstacles. • Capable of managing multiple assignments/priorities in a fast-paced environment using discernment, organization, time and objectives management. • Exercises considerable latitude in determining objectives and approaches to assignments. • Provides project leadership, when needed, for the purpose of planning, directing, controlling, developing and maintaining the Resiliency program. • Knowledge of Crisis Management, Business Continuity and Event Management practices • Understands that Resiliency professionals can be called in to support test exercises or unplanned outages off regular business hours as necessary • Strong verbal and written communication skills; capable to report progress, status and other pertinent information to partners and leadersMinimum Qualifications• Bachelor's degree in Computer Science, Mathematics, or a related technical field.• 5 years of professional experience in technical engineering.• Five (5) years of automation experience• Three (3) years of professional experience with cloud computing (IAC, Microsoft Azure, AWS)
Primary Skills(Must Have)Provide automated solution for various infra components using ansible.Develop custom ansible modules/roles as per business requirement. Ansible AWX Configure Job Templates, workflows and schedules etc.Manage dynamic/static inventory and notifications.Strong knowledge on Linux, Windows platforms and shell, PowerShell scripting.Analyze logs, matrices and events to identify and resolve errors.Integration with external system like CyberArk, Service Now etc.Strong knowledge on python and python scripting.Expertise in REST APIs, Ansible collections, Ansible Galaxy.Must have strong knowledge on source control management like Git, GitHub or BitbucketCollaborate with the infra teams to identify new scope or enhance existing automations.Good knowledge on disaster recovery.Familiar with yml, json, jinja templates.Excellent communication and documentation skills leveraging the Atlassian Stack (JIRA, Confluence)Secondary Skills(Good to have):Expertise in VMware automations. Code Stream Pipelines and Service broker.Integrate automation with any DR tool.Knowledge on cloud services using AWS or Azure etc