Job Description
We are looking for a highly skilled Data Engineers with 5 to 8 years of experience in data engineering specializing in PySpark PythonGCP IAM CS DataProc BigQuery SQL Airflow and building data pipelines Handling TerabyteScale Data Processing The ideal candidate will have a strong background in designing developing and maintaining scalable data pipelines and architectures
Key Responsibilities
Design develop and maintain scalable data pipelines using PySpark Python GCPAWS and Airflow
Implement data processing workflows and ETL processes to extract transform and load data from various sources into data lakes and data warehouses
Manage and optimize data storage solutions using GCP services
TerabyteScale Data ProcessingDeveloped and optimized PySpark code to handle terabytes of data efficiently Implemented performance tuning techniques to reduce processing time and improve resource utilization
Data Lake ImplementationBuilt a scalable data lake on GCP CS to store and manage structured and unstructured data
Data Quality FrameworkDeveloped a data quality framework using PySpark and GCP to perform automated data validation and anomaly detection Improved data accuracy and reliability for downstream analytics
Collaborate with data scientists analysts and other stakeholders to understand data requirements and deliver highquality data solutions
Perform data quality checks and validation to ensure data accuracy and consistency
Monitor and troubleshoot data pipelines to ensure smooth and efficient data processing
Stay updated with the latest industry trends and technologies in data
Job Classification
Industry: Recruitment / Staffing
Functional Area / Department: Engineering - Software & QA
Role Category: Software Development
Role: Data Engineer
Employement Type: Full time
Contact Details:
Company: Apptad
Location(s): Mumbai
Keyskills:
design development
pyspark
bigquery
python
data engineering
hive
data validation
scala
anomaly detection
sql
iam
gcp
spark
hadoop
etl
big data
data lake
performance tuning
airflow
data processing
dataproc
data quality
sqoop
aws
etl process