Job Description
In this Senior data engineer role, you will be responsible for designing and implementing complex data pipelines, optimizing data transformations, and ensuring the scalability and performance of our data infrastructure .
The ideal candidate will have a deep technical background and a proven track record in managing data engineering projects and collaborating with cross-functional teams to drive data-driven decisions. This opportunity will provide you with experience working with cloud data platforms such as Snowflake and data transformation tools like dbt. You will gain exposure to AWS infrastructure for building innovative and efficient data solutions and work with various data sources, including Oracle, MS SQL Server, and Postgres.
Basic Qualifications
Bachelors or Masters degree in Computer Science, Engineering, Data Science, or related field.6+ years of experience in data engineering, with a proven track record of working in large-scale data initiatives.Deep expertise in Python, PySpark.Strong hands-on experience with Databricks (Spark, Delta Lake, Workflows)Strong experience with AWS (S3, IAM, Textract, Bedrock or equivalent)Experience with design and implement scalable document ingestion pipelines using Databricks Auto Loader and AWS S3.Understanding of vector embeddings and semantic searchStrong understanding of data governance, privacy, and compliance in regulated industries (healthcare, life sciences).Good To have :Advanced knowledge of data modeling, lakehouse/lake/warehouse design, and performance optimization.Familiarity with generative AI platforms and use cases.Contributions to open-source projects or thought leadership in data engineering/architecture.Experience with Agile methodologies, CI/CD, and DevOps practices.Exposure to FastAPI, or API-based ML servicesExperience evaluating LLM output quality
Key Responsibilities :
Design, develop, and optimize complex data pipelines and transformation processes using Snowflake, dbt, and AWS services.Develop and maintain scalable data models and schemas in Snowflake, ensuring they meet performance and business requirements.Monitor and fine-tune the performance of data pipelines, queries, and data models to ensure optimal efficiency and cost-effectiveness.Utilize Snowflakes features, such as Time Travel, Zero-Copy Cloning, and Data Sharing, to enhance data management and performance.Leverage AWS services, such as AWS Lambda, S3, and Glue, to build and manage serverless data processing workflows and data storage solutions.Implement data security measures and ensure compliance with data privacy regulations and organizational policies.Troubleshoot and resolve complex data issues, including data sync errors, performance bottlenecks, and integration challenges.Provide support for data-related incidents and ensure effective resolution of production issues.Collaborate with data analysts, and other stakeholders to understand data needs and deliver effective solutions.Document data processes, models, and workflows, ensuring clear communication and knowledge sharing across teams.Independently assess situations, apply sound judgment and discretion, and make decisions on matters of significant impact without direct supervisionMandatory Competencies
Cloud - AWS - AWS S3, S3 glacier, AWS EBS
Beh - Communication
Big Data - Big Data - SPARK
Big Data - Big Data - Pyspark
Tech - Agile Methodology
Development Tools and Management - Development Tools and Management - CI/CD
Data Science and Machine Learning - Data Science and Machine Learning - Databricks
Programming Language - Python - Flask
Programming Language - Python - OOPS Concepts
Programming Language - Python - Python Shell
Job Classification
Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: Software Development
Role: Data Engineer
Employement Type: Full time
Contact Details:
Company: Iris Software
Location(s): Noida, Gurugram
Keyskills:
data engineer
logistics
sql
lambda
data science
iam
spark
use cases
devops
oops
big data
postgres
architecture
s3
snowflake
python
ai
databricks
llm
machine learning
sql server
33268
compliance
data governance
agile
flask
aws