As a GCP data engineer the colleague should be able to designs scalable data architectures on Google Cloud Platform, using services like Big Query and Dataflow. They write and maintain code (Python, Java), ensuring efficient data models and seamless ETL processes. Quality checks and governance are implemented to maintain accurate and reliable data.
Security is a priority, enforcing measures for storage, transmission, and processing, while ensuring compliance with data protection standards. Collaboration with cross-functional teams is key for understanding diverse data requirements. Comprehensive documentation is maintained for data processes, pipelines, and architectures.
Responsibilities extend to optimizing data pipelines and queries for performance, troubleshooting issues, and proactively monitoring data accuracy. Continuous learning is emphasized to stay updated on GCP features and industry best practices, ensuring a current and effective data engineering approach.
Experience
- Proficiency in programming languages: Python, Pyspark
- Expertise in data processing frameworks: Apache Beam (Data Flow)
- Active experience on GCP tools and technologies like Big Query, Dataflow, Cloud Composer , Cloud Spanner, GCS, DBT etc.,
- Data Engineering skillset using Python, SQL
- Experience in ETL (Extract, Transform, Load) processes
- Knowledge of DevOps tools like Jenkins, GitHub, Terraform is desirable. Should have good knowledge on Kafka (Batch/ streaming)
- Understanding of Data models and experience in performing ETL design and build, database replication using Message based CDC
- Familiarity with cloud storage solutions
- Strong problem-solving abilities in data engineering challenges
- Understanding of data security and scalability
- Proficiency in relevant tools like Apache Airflow
Desirables
Keyskills: Big query GCP Bigquery GCP Data Engineer - GCP Python Cloud Composer Groovy Data flow SQL air flow