Cloud Operations Engineer
Remote | United States
Cloud Operations Engineer (Linux, Web-Based, Azure)
Position Summary
Bamboo Rose is seeking a Cloud Operations Engineer (CloudOps). This person will be responsible for supporting production Web and Application Servers for our customer base and applications for internal and external facing systems. The candidate will work to deliver high-quality, customer-focused solutions and support services that provide substantial value to the business.
Responsibilities Include
- Deployment, administration, and operational support of (PROD, QA, DEV) environments for multiple projects
- Be proactively responsible for ensuring the health of the cloud platform through regular maintenance activities, audit reviews, and through the development of continuous improvement goals
- Monitor and manage applications, databases, and web servers to proactively predict and prevent impending issues and maintain 24/7/365 Service Availability
- Develop and mature CloudOps capabilities within the team and identify simplification opportunities for operational processes
- Triage and resolve support tickets related to the functionality, performance, and availability of the services we provide to our customers
- Participate in performance analysis, monitoring, backups and security certification of all environments
- Provide technical expertise in the diagnosis and resolution of issues, including the determination and provisioning of workaround solutions
- Create and deliver presentations and documentation to the Technology team
- Create and maintain technical standards and documentation for software solutions and processes
- Identify, develop and implement best-practices to increase system reliability and performance
- Manage project timelines and workloads
Requirements
- 2+ years of experience with a tier-1 cloud provider like AWS, Google Cloud or Azure
- Experience in managing 24/7/365 cloud applications
- Have expertise in administering Linux/Unix environments in a business-critical production environment
- Experienced in Apache Tomcat, Solr administration, and performance tuning
- Experienced in deploying and managing databases such as Postgres, Oracle, SQL Server
- Experience with tools such as Terraform, Jenkins etc.
- Experience with monitoring applications such as New Relic
- Ability to function effectively in a fast-paced environment, handle multiple tasks simultaneously, prioritize and meet deadlines
- Ability to document complex technical issues clearly
- Self-directed with the initiative and project management skills to be productive with little direction
- Excellent writing, communication and presentation skills.
- Strong, proven diagnostic and troubleshooting Skills
Desired Skills
- Highly motivated with a passion for improvement and technical evolution.
- Excellent collaboration and team building skills
- Ability to work in interrupt-driven environment with co-workers across several continents and time zones
- Recent experience with Microsoft Azure PaaS & IaaS offerings in both an infrastructure and application context
- Solid experience as either a Cloud Engineer or Site Reliability Engineer in a global enterprise
- Ability to program in an administrative language (Shell scripts, etc.) to automate basic processes
- Knowledge of Disaster Recovery replication and failover solutions
- Experience with automation using either Puppet, Ansible, Terraform or an equivalent technology
- CI/CD philosophy, processes, and toolsets
- Orchestration tools such as Jenkins, Puppet, Chef, Ansible, or other similar products
- Infrastructure as code frameworks such as CloudFormation, Terraform, or other similar products
- Infrastructure tools such as AWS, VMware, Docker, Kubernetes, and Swarm
- Solid understanding of JSON, XML, and related notational data representations
- Knowledge of cloud architecture, networking, system administration, and security best practices
- Knowledge about APIs, RESTful services and how they can be used to integrate with Azure