Key Responsibilities
1. Data Quality Framework Design & Leadership
-
Define and implement enterprise-wide data quality frameworks and governance standards.
-
Architect automated DQ pipelines using Databricks (Delta Lake) , PySpark , and Ataccama ONE .
-
Design DQ monitoring architecture profiling, lineage integration, and alerting mechanisms.
-
Establish KPIs and DQ scorecards to measure and communicate data trust metrics across domains.
2. Advanced Data Quality Development & Automation
-
Build and optimize complex validation, reconciliation, and anomaly detection workflows using PySpark and Python.
-
Implement rule-based and ML-based DQ checks, leveraging Ataccama workflows and open-source frameworks.
-
Integrate DQ rules into CI/CD and orchestration platforms (Airflow, ADF, or Databricks Workflows).
-
Partner with data engineers to embed DQ checks into ingestion and transformation pipelines.
3. Root Cause Analysis & Continuous Improvement
-
Lead root-cause investigations for recurring DQ issues and drive long-term remediation solutions.
-
Create and enforce best practices for rule versioning, DQ exception handling, and reporting.
-
Own the playbook for DQ incident response and continuous optimization.
4. Stakeholder Management & Governance
-
Act as the primary liaison between business data owners, IT, and governance teams.
-
Translate business DQ requirements into technical implementation strategies.
-
Drive executive-level reporting on DQ KPIs, SLAs, and issue trends.
-
Contribute to metadata management, lineage documentation, and master data alignment.
5. Mentorship & Leadership
-
Guide junior analysts and data engineers in developing robust DQ solutions.
-
Lead cross-functional squads to implement new data quality capabilities or upgrades.
-
Contribute to capability uplift training peers on DQ best practices, tools, and technologies.
Core Technical Skills
Category
Tools / Skills
Data Engineering & Quality
Databricks (Delta Lake), PySpark, SQL, Python
DQ Platforms
Ataccama ONE / Studio (rule authoring, workflow automation, profiling)
Orchestration & CI/CD
Apache Airflow, Azure Data Factory, Databricks Workflows, GitHub Actions
Data Warehouses
Databricks Lakehouse
Cloud & Infrastructure
Azure (preferred), AWS, or GCP; familiarity with Terraform or IaC concepts
Version Control / CI-CD
Git, GitHub Actions, Azure DevOps
Metadata & Governance
Collibra, Alation, Ataccama Catalog, OpenLineage
Monitoring & Observability
Grafana, Datadog, Prometheus for DQ metrics and alerts
Qualifications & Experience
-
Bachelor s or Master s in Computer Science, Information Systems, Statistics, or related field.
-
9 12 years of experience in data quality, data engineering, or governance-focused roles.
-
Proven experience designing and deploying enterprise DQ frameworks and automated checks.
-
Strong expertise in Databricks , PySpark , and Ataccama for data profiling and rule execution.
-
Advanced proficiency in SQL and Python for large-scale data analysis and validation.
-
Solid understanding of data models, lineage, reconciliation, and governance frameworks
-
Experience integrating DQ checks into CI/CD pipelines and orchestrated data flows.
Soft Skills & Leadership Attributes
-
Strong analytical thinking and systems-level problem solving.
-
Excellent communication and presentation skills for senior stakeholders.
-
Ability to balance detail orientation with strategic vision.
-
Influencer with a proactive, ownership-driven mindset.
-
Comfortable leading cross-functional teams in fast-paced, cloud-native environments.
Preferred / Nice to Have
-
Experience in financial, manufacturing, or large enterprise data environments.
-
Familiarity with MDM , reference data , and data stewardship processes.
-
Exposure to machine learning-driven anomaly detection or predictive data quality .
-
Certifications: Databricks, Ataccama, or Cloud Data Engineering certifications (Azure/AWS).
Success Indicators
-
Increased DQ rule coverage and automation across key data domains.
-
Reduced manual DQ exceptions and faster remediation cycle times.
-
Measurable improvement in data trust metrics and reporting accuracy.
-
High stakeholder satisfaction with data availability and reliability.