Working knowledge in PySpark(python and Spark) with Bigdata
Worked on big data technologies like Spark Map Reduce Kafka Hive HDFS Proven track record in HDFS and Unix commands. Knowledge on extracting data from different sources such as DBMS NoSQL. Managed Spark on HDFS cluster Very good knowledge of Spark & Scala Experience with open source technologies used in Big Data analytics like Pig Hive HBase Kafka PySpark knowledge is a must and handle with Impala Hive Data Lake. Extracting Text using OCR mainly or with knowledge in Tesseract would fulfill the same. Willingness to learn ability to think skeptically about problems and results curious to explore new techniques and domains. Design Patterns (GoF) would be great in developing complex PySpark algorithms is essential.