Lenovo is seeking a Cloud Operations Engineer to join our Global Innovation Center (GIC) Cloud Software (CSW) team. This role involves designing, developing, and overseeing the global Lenovo cloud products and their operational availability in production. The engineer will work with global Lenovo teams to improve the reliability of cloud products and services according to Lenovo's security and operational reliability standards. The engineer will use modern DevOps practices to ensure efficient integration and delivery of software throughout the S-SDLC.
Responsibilities:
- Address S-SDLC blockers
- Handle access requests
- Provide build/release support
- Manage environment promotion and deployments
- Oversee cloud resource management
- Conduct infrastructure tests and reviews
- Maintain cloud infrastructure and environments
- Offer source control support
- Architect and design cloud solutions
- Research and participate in POCs
- Optimize cloud resources (compute, storage, network)
- Automate manual tasks
- Develop infrastructure as code and automation scripts
- Build and monitor health dashboards
- Debug cloud infrastructure and applications
- Enhance and streamline all responsibilities
- Optimize costs across all responsibilities
Minimum Qualifications:
- Bachelors in Computer Science or related technical degree
- 5+ years in AWS cloud infrastructure (DevOps, SRE, PaaS/IaaS engineer)
- 3+ years in microservice and serverless SaaS operations (AWS ECS/EKS, Docker containerization, AWS Lambda)
- Familiar with modern DevOps practices & skilled in CI/CD pipelines
- Holds an AWS Solutions Architect Certification or equivalent
- Automation mindset and supports software development teams via self-service methods
- Skilled in infrastructure as code (Terraform HCL), GitHub Actions/Jenkins pipelines, Python, and maintaining site reliability
- Ability to create production environment health monitoring dashboards using tools such as DataDog, Splunk, or Sumologic.
Preferred Qualifications:
- Cloud and application monitoring for health, availability, and uptime
- Interpreting cloud resource telemetry data to prevent service degradation
- Ability to debug and troubleshoot application issues using application and cloud logs
- Experience with environment promotion workflows and testing before production deployment