Data Engineer Lead
Job Description
The Lead Data Engineer is a strategic and technical leadership role responsible for architecting, scaling, and evolving enterprise-grade data platforms that enable advanced analytics, AI/ML, and data-driven decision-making. Reporting to the Senior Director of Data Platforms, this role will lead the design and governance of modern data architectures, drive innovation in AI orchestration, and ensure the delivery of secure, compliant, and high-performing data solutions.
This position combines hands-on engineering expertise with architectural vision and cross-functional leadership. The Lead Data Architect will guide engineering teams, influence platform strategy, and establish best practices across the organizations data ecosystem.
Basic Qualifications:
Bachelors or Masters degree in Computer Science, Engineering, Data Science, or related field.
8+ years of experience in data engineering and architecture, with a proven track record of leading large-scale data initiatives.
Deep expertise in Python, PySpark.
Strong hands-on experience with Databricks (Spark, Delta Lake, Workflows)
Strong experience with AWS (S3, IAM, Textract, Bedrock or equivalent)
Demonstrated success in architecting and deploying AI/ML pipelines in production.
Experience with design and implement scalable document ingestion pipelines using Databricks Auto Loader and AWS S3.
Understanding of vector embeddings and semantic search
Strong understanding of data governance, privacy, and compliance in regulated industries (healthcare, life sciences).
Preferred Qualifications:
Advanced knowledge of data modeling, lakehouse/lake/warehouse design, and performance optimization.
Familiarity with generative AI platforms and use cases.
Contributions to open-source projects or thought leadership in data engineering/architecture.
Experience with Agile methodologies, CI/CD, and DevOps practices.
Exposure to FastAPI, or API-based ML services
Experience evaluating LLM output quality
Key Responsibilities:
Architect Scalable Data Platforms: Design and oversee implementation of robust data lakehouse, data mesh, and real-time streaming architectures to support enterprise analytics and AI/ML initiatives.
Lead Engineering Teams: Provide technical leadership and mentorship to data engineers, fostering a culture of excellence, innovation, and continuous improvement.
AI/ML Enablement: Collaborate with Data Science and ML Engineering teams to operationalize models, implement AI orchestration frameworks (e.g., MLflow, Airflow), and ensure scalable deployment pipelines.
Platform Strategy & Governance: Define and enforce architectural standards, data governance policies, and compliance frameworks (HIPAA, SOC 2, GDPR, etc.) across the data platform.
Performance & Reliability Optimization: Drive observability, automation, and performance tuning across data pipelines and infrastructure to ensure reliability at scale.
Cross-Functional Collaboration: Partner with product, analytics, compliance, and infrastructure teams to align data architecture with business goals and regulatory requirements.
Innovation & Thought Leadership: Stay ahead of industry trends, evaluate emerging technologies, and contribute to strategic decisions on platform evolution, including generative AI integration and event-driven systems.
Mandatory Competencies
Cloud - AWS - AWS S3, S3 glacier, AWS EBS
Development Tools and Management - Development Tools and Management - CI/CD
Big Data - Big Data - Pyspark
Data Science and Machine Learning - Data Science and Machine Learning - Python
Data Science and Machine Learning - Data Science and Machine Learning - AI/ML
Big Data - Big Data - Azure databricks

Keyskills: Data Engineering Data Bricks Python Azure Data Factory Pyspark Azure Databricks Azure Data Lake SQL