Currently working as Big Data Engineer with 7+ years of programming and software development experience with skills in data engineering, data analysis, application design, development, testing and deployment of software systems from development stage to production stage in Big Data and Java technologies.
5+ years of experience in Big Data and tools in Hadoop Ecosystem including Spark, MapReduce, Hive, Sqoop, Oozie, Kafka, and HBase.
In-Depth knowledge in working with Distributed Computing Systems and parallel processing techniques to efficiently deal with Big Data.
Firm understanding of Hadoop architecture and various components including HDFS, Yarn, MapReduce, Hive, Pig, HBase, Kafka, Oozie etc.,
Strong experience building Spark applications using Scala and python as programming language.
Good understanding of Spark Internals and job execution lifecycle of spark applications.
Gained experience troubleshooting various kinds of failures in spark applications and fine-tuning long running spark jobs.
Strong experience using Spark RDD API, Spark Data frame/Dataset API, Spark-SQL and Spark ML frameworks for building end to end data pipelines.
Good experience working with real time streaming pipelines using Kafka and Spark-Streaming.
Strong experience working with Hive for performing various data analysis.
Detailed exposure with various hive concepts like Partitioning, Bucketing, Join optimizations, Ser-De’s, built-in UDF’s and custom UDF’s.
Good experience in automating end to end data pipelines using Oozie workflow orchestrator.
Good experience working with Cloudera, Hortonworks, and AWS big data services.
Strong experience using and integrating various AWS cloud services like S3, EMR, Glue Metastore, Athena, Redshift into the data pipelines.
Experience in analyzing, designing, and developing ETL Strategies and processes, writing ETL specifications.
Excellent understanding of NoSQL databases like Hbase, Cassandra, MongoDB.
Proficient knowledge and hand on experience in writing shell scripts in Linux.
Experienced in developing web services with REST and SOAP protocols using Python Programming Language.
Hands on experience with continuous integration and automation using Jenkins and version control tools such as GIT, SVN and Ticket management tool JIRA.
Worked in Agile & waterfall methodologies with high quality deliverables on-time.
Big Data Ecosystem Hadoop, HDFS, MapReduce, Hive, Sqoop, Oozie, HBase, Hue, Cloudera, Hortonworks, Spark, Kafka