We are seeking an experienced AWS Platform Operations Engineer (L2 / L3) to support and manage our AWS-based Data Lake Platform and associated cloud infrastructure. This role is responsible for day-to-day platform operations, monitoring, incident resolution, infrastructure support, access and entitlement management, security remediation, CI/CD support, disaster recovery activities, and continuous improvement of platform stability, security, and operational efficiency.
The ideal candidate should have strong hands-on experience in AWS cloud operations, production support, and platform governance, along with the ability to troubleshoot complex issues and support enterprise-grade cloud environments.
Key Responsibilities
- Provide L2/L3 operational support for AWS platform services and the AWS Data Lake environment.
- Manage and support AWS services including S3, Athena, SSM, IAM, EC2, Lambda, Glue Catalog/Jobs/Crawlers, CloudWatch, VPC, CloudFormation, Lake Formation, ECR, and DynamoDB.
- Monitor platform health, investigate alerts, troubleshoot failures, and ensure timely resolution of incidents and service requests.
- Handle incident, change, and problem management tickets in line with enterprise support processes and SLAs.
- Manage IAM users, roles, policies, entitlements, and resource permissions in accordance with security and governance standards.
- Support Lake Formation permissions, data access controls, and secure onboarding/offboarding of users and applications.
- Administer and troubleshoot Glue jobs, crawlers, catalog metadata, Athena query issues, and S3-based data lake access.
- Support infrastructure provisioning, deployment, and change implementation using CloudFormation and CI/CD pipelines.
- Contribute to CI/CD enablement, release support, deployment validation, and code promotion across environments.
- Ensure code repository standardization, branching/version control practices, and alignment with engineering and operational standards.
- Participate in Disaster Recovery (DR) activities, including DR readiness checks, backup/restore validation, failover/failback support, and documentation updates.
- Perform root cause analysis (RCA) for recurring or critical issues and recommend preventive actions.
- Support CloudWatch dashboards, logging, alerting, and operational reporting.
- Assist with AWS cost monitoring and optimization, tagging compliance, and resource usage governance.
- Perform or support security vulnerability scans, review findings, coordinate remediation, and drive closure of security-related tickets in collaboration with security and engineering teams.
- Maintain runbooks, SOPs, operational documentation, and knowledge articles.
- Collaborate with data engineers, developers, cloud architects, security, governance, and DevOps teams for platform support and continuous improvements.
Required Skills
- Strong hands-on experience in AWS platform operations / cloud support / production support.
- Working knowledge of:
- Amazon S3
- Amazon Athena
- AWS IAM
- AWS Systems Manager (SSM)
- Amazon EC2
- AWS Lambda
- AWS Glue (Catalog, Jobs, Crawlers)
- Amazon CloudWatch
- Amazon VPC
- AWS CloudFormation
- AWS Lake Formation
- Amazon ECR
- Amazon DynamoDB
- Experience in access provisioning, entitlement management, resource permissions, and governance controls.
- Strong troubleshooting skills across cloud infrastructure, platform services, and data workflows.
- Experience in CI/CD support, deployment coordination, and operational readiness for releases.
- Understanding of code repository management and standardization practices.
- Exposure to DR processes, backup/recovery validation, and resiliency support.
- Experience with security vulnerability remediation and closure of security/compliance-related tickets.
- Good understanding of incident, change, and problem management processes.
- Knowledge of AWS security best practices, governance, and operational controls.
Preferred Skills
- Experience with Python / Shell scripting for automation and operational tasks.
- Exposure to DevOps practices, infrastructure automation, and release management.
- Familiarity with cloud cost optimization and governance frameworks.
- AWS certification(s) such as Cloud Practitioner, SysOps Administrator, or Solutions Architect Associate.
Experience & Qualification
- 7–10 years of overall IT experience with 5+ years in AWS cloud/platform operations.
- Bachelor’s degree in Computer Science, Engineering, Information Technology, or related field.