
Keyskills: cloudera hive pyspark linux hadoop scala amazon redshift data warehousing emr sql docker apache java spark gcp etl big data hbase data lake python oozie airflow microsoft azure impala data engineering nosql amazon ec2 mapreduce kafka sqoop aws