Role & responsibilities
Design and build a scalable data platform that can ingest, store, manage and stream massive amounts of data while simplifying its analysis and processing to enable rapid development of high-quality data products and services.
Implement and test robust low latency data pipelines for preparing/curating data that will support various data as a service product.
You will work with various cross functional teams to explore data and figure out all the data wrangling needed to clean, curate the data and build the end-to-end data pipeline per requirements.
You will design necessary privacy and security preventive and detective controls into the CICD pipelines of each data pipeline.
You will have a strong bias for operational excellence ensuring error handling, restart-ability ensuring data consistency, logging, monitoring and alerting is built into the pipeline.
You will drive continual improvements for reliability, performance, scalability, quality and own associated KPI.

Keyskills: Kafka Data Bricks AWS Streaming Pyspark Data Engineering Apache Flink Hadoop Big Data Spark Streaming SQL Aws Kinesis Kinesis Low Latency SCALA Data Lake Spark Data Warehousing ETL