Job Description
Core Job responsibilities: 1. Monitor & Optimise Redshift clusters:
Monitor Amazon Redshift clusters, identify long-running queries, and optimize them to maintain cluster performance and ensure healthy operational state 2. Monitor Data Pipelines & ETL Jobs
a. Continuously monitor Glue, Airflow, Lambda, Redshift, Spark, EMR and Kinesis jobs.
b. Identify failures, performance degradation, or bottlenecks in real time. 3. Troubleshoot Data Pipeline Failures
a. Diagnose issues in extraction, transformation, loading, schema mismatches, and data quality.
b. Perform impact analysis and apply immediate fixes. 4. Provide continuous support of existing data engineering products / tools / platforms / solutions that DE built and even extend them for new use cases onboard. 5. Handle On-Call / Incident Response
a. Own the end-to-end on-call rotation, respond to PagerDuty alerts, and restore systems within SLA.
b. Work directly with data engineering teams to resolve critical incidents. 6. Conduct Root Cause Analysis (RCA)
a. Perform RCA for every major incident.
b. Document findings and propose long-term preventive solutions. 7. Manage Data Quality & Validation
a. Validate accuracy, completeness, freshness, lineage, and schema consistency 8.Optimize Queries & Performance
a. Optimize inefficient SQL (Athena/Redshift/Presto/Spark).
b. Tune warehouse performance, resolve WLM queue issues, and reduce compute cost. 9.Maintain Metadata, Catalogs & Schemas
a. Manage Glue Catalog, partition refresh, schema evolution, table permissions, and lineage.
b. Ensure smooth integration between S3, Glue, Athena, Redshift, and Lake Formation. 10.Support Deployments & Release Management
a. Assist in promoting ETL jobs, model code, and pipeline configurations through CI/CD.
b. Validate deployments and perform rollback when necessary. 11. Collaborate with BI, Product & Stakeholders
a. Work with BI teams, analysts, PMs, and upstream/downstream owners.
b. Provide data accessibility support & answer data troubleshooting queries. 12. Maintain Documentation & SOPs
a. Maintain playbooks, runbooks, troubleshooting guides, and data dictionaries.
b. Ensure knowledge transfer and training for new team members. 2+ years of scripting language experience
Strong SQL and debugging skills
AWS (S3, Glue, EMR, Lambda, Redshift, Athena)
Strong Python and Pyspark skills
Understanding of data modelling, ETL, and batch/streaming pipelines
Experience with version control and CI/CD (Git, CodePipeline)
Good communication for stakeholder-facing troubleshooting
Good to have GenAI Skillset, but not mandatory Experience with AWS, networks and operating systems
Job Classification
Industry: Internet
Functional Area / Department: Engineering - Software & QA
Role Category: DBA / Data warehousing
Role: Database Administrator
Employement Type: Full time
Contact Details:
Company: Amazon
Location(s): Hyderabad
Keyskills:
RCA
metadata
Schema
Debugging
Data quality
Operations
Release management
Downstream
SQL
Python