Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Senior Data Engineer @ Idexcel

Home > Software Development

 Senior Data Engineer

Job Description

Databricks (Spark)

  • Develop scalable ETL/ELT pipelines using PySpark (RDD/DataFrame APIs), Delta Lake, Auto Loader (cloudFiles), and Structured Streaming.
  • Optimize jobs: partitioning, bucketing, Z-Ordering, OPTIMIZE + VACUUM, broadcast joins, AQE, checkpointing.
  • Manage Unity Catalog: catalogs/schemas/tables, data lineage, permissions, secrets, tokens, and cluster policies.
  • CI/CD for Databricks assets: notebooks, Jobs, Repos, MLflow
  • Build Medallion Architecture (Bronze/Silver/Gold) with Delta Live Tables (DLT) and expectations for data quality.
  • Event-driven ingestion: Kafka/Kinesis Databricks Streaming

Snowflake (DW & ELT)

  • Model and implement star/snowflake schemas, data marts, and secure views.
  • Performance tuning: clustering keys, micro-partitions, result caching, warehouse sizing, query profile
  • Implement Task/Stream patterns for CDC; external tables for data lakes (S3); Snowpipe for near-real-time ingestion.
  • Python/Snowpark for transformations and UDFs; SQL best practices (CTEs, window functions).
  • Security: Row Level Security (RLS), Column Masking, OAuth/SCIM, network policies, data sharing (reader accounts).

AWS Data Engineering

  • Storage & compute: S3 (lifecycle, encryption, partitioning), EMR (if needed), Lambda, Glue (ETL/Schema registry), Athena, Kinesis (Data Streams/Firehose), RDS/Aurora, Step Functions.
  • Orchestration: MWAA/Airflow or Step Functions (error handling, retries, backfills, SLA alerts).
  • Infra-as-code: Terraform/CloudFormation for reproducible environments (Databricks workspace, IAM, S3, networking).
  • Security/compliance: IAM least privilege, KMS, VPC endpoints/private links, Secrets Manager, CloudTrail/CloudWatch, GuardDuty.
  • Observability: CloudWatch metrics/logs, structured logging, datadog/Prometheus (optional), cost monitoring (tags/budgets).

Data Quality, Governance & Security

  • Implement unit/integration tests for pipelines (e.g., pytest + Great Expectations + DLT expectations).
  • Data contracts and schema evolution; monitor SLA/SLO; DQ dashboards (missingness, drift, freshness, completeness).
  • PII handling: tokenization/pseudonymization, field-level encryption, KYB/KYC data flows adherence; audit trails.
  • Cataloging & lineage through Unity Catalog and/or OpenLineage/Purview (if applicable).

DevOps & CI/CD

  • Git workflows (branching, PR reviews), Databricks CLI/Terraform modules for jobs/clusters/UC, Snowflake DevOps (object versioning via schemachange or SQL-based migration).
  • Automated testing in pipelines; feature flags, canary releases for data jobs; rollback strategies.

Client-Facing PoCs & Delivery

  • Rapid PoC build: clearly defined success metrics, benchmark cost/performance, produce a transition plan to production.
  • Present architectural decisions, trade-offs (Spark vs Snowflake ELT), and cost projections (Databricks DBU, Snowflake credits, storage egress).
  • Produce runbooks, operational playbooks, and knowledge transfer documents for client teams.

Required Technical Skillset

  • Databricks: PySpark, Delta Lake, Auto Loader, DLT, Jobs, Unity Catalog, MLflow basics.
  • Snowflake: SQL, Snowpipe, Tasks/Streams, Snowpark (Python), warehouse sizing, performance tuning, security policies.
  • Python: strong in packages for DE (pandas, pyarrow, pytest), robust error handling, typing, and packaging.
  • Orchestration: Airflow DAGs (Sensors, Operators, XCom), Step Functions state machines.
  • Streaming & CDC: Kafka/Kinesis, Debezium (nice-to-have), CDC patterns to Delta/Snowflake.
  • AWS: S3, Glue, Lambda, Kinesis, IAM/KMS, VPC, CloudWatch; Terraform/CloudFormation.
  • Data Modeling: 3NF/Dimensional, slowly changing dimensions (SCD Type 2), surrogate keys, surrogate vs natural debates.
  • Security & Compliance: encryption at rest/in transit, tokenization, key rotation, audit logging, governance controls.
  • Performance & Cost: Spark job tuning, Snowflake warehouse right-sizing, partitioning/clustering, object storage best practices.

Nice-to-Have:

  • dbt (Snowflake) with tests & exposures; Great Expectations.
  • Databricks SQL Warehouses and BI connectivity; Photon engine awareness.
  • Lakehouse Federation (UC external locations); Delta Sharing; Iceberg
  • Kafka Connect/Debezium, NiFi or MuleSoft (for data integrations).
  • Experience in financial services
  • Exposure to ISO/IEC 27001 controls in data platforms.

Education & Certifications

  • Bachelors/Masters in CS/IT/EE or related.
  • Certifications (plus): Databricks Data Engineer Associate/Professional, Snowflake SnowPro Core/Advanced, AWS Solutions Architect/Big Data/DP.

Job Classification

Industry: Recruitment / Staffing
Functional Area / Department: Engineering - Software & QA
Role Category: Software Development
Role: Data Engineer
Employement Type: Full time

Contact Details:

Company: Idexcel
Location(s): Hyderabad

+ View Contactajax loader


Keyskills:   Data Engineering PySpark Auto Loader Data Quality DevOps Snowflake Delta Lake CI/CD Data Modeling Databricks ETL AWS Data Governance Python

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Custom Software Engineer

  • Accenture
  • 5 - 10 years
  • Chennai
  • 4 hours ago
₹ Not Disclosed

Custom Software Engineer

  • Accenture
  • 2 - 5 years
  • Bengaluru
  • 9 hours ago
₹ Not Disclosed

Custom Software Engineer

  • Accenture
  • 5 - 10 years
  • Bengaluru
  • 10 hours ago
₹ Not Disclosed

Custom Software Engineer

  • Accenture
  • 5 - 10 years
  • Chennai
  • 10 hours ago
₹ Not Disclosed

Idexcel

Idexcel Technologies Private Limited Idexcel is a Professional Services and Technology Solutions provider specializing in Cloud Services, Application Modernization, and Data Analytics. Idexcel is proud that for more than 21 years it has provided services that implement complex technologies that ...