Responsibilities: Creating, maintaining data infrastructure that can support large amounts of data processing Designing, building scalable data pipelines and API integrations to support growing data volumes Implementing right design patterns, involving message queues, multi-threading / processing Implement best practices, code reviewing, and ability to adhere to overall architecture principles Making improvement and process recommendations that have an impact on the business Solving complex issues involving very large data volumes, with minimal supervision Able to design pipelines which are cost effective, being mindful of the infra optimization goals Follow, learn and apply new big data technologies and innovations Required experience, skills 6+ years of hands-on data engineering development experience in Python Advanced knowledge of Python and Linux shell scripting Proficiency in SQL and Data Structures, experience in handling (transformations, read / write) huge volume of data using Python
Experience in performance improvement on data pipelines and a good understanding of Volume, Variety, Veracity, Velocity, and Value concepts in Big Data
Keyskills: Software design Linux Cloud Services Cloud Data structures big data AWS Performance improvement Python SQL