The Data Lake developer supports the design, development and ongoing support of Data Lake environment. Responsible for defining, deploying and managing next generation data lake and master data architectures for our Clients to enable information transparency and data quality across systems. Individual must have a demonstrated experience with agile project delivery of modern Data Lakes to support Business Intelligence, Data Integrations with increasing responsibility/breadth over time.
- Excellent in Written and Verbal Communication
- Hands-on experience working with Terra bytes \ Peta bytes scale data and millions of transactions per day.
- In-depth knowledge and extensive experience to build batch and streaming based workloads on AWS using AWS EMR, AWS GLUE, AWS Athena, AWS Dynamo DB, AWS REDSHIFT, AWS RDS, AWS Aurora.
- Experience working on Apache Airflow in recent projects.
- Very strong Python development skills to define dynamic Airflow pipelines and comfortable working on Linux platform.
- Experience working in a hybrid cloud environment.
- Worked on the both the development and maintenance projects and followed the operational processes
- Proven practical experience in migrating relational data base from on-prem to cloud using AWS DMS
- Architect, design and build Data Lake infrastructure platform, primarily based on AWS infrastructure, ensuring that the infrastructure is highly-available and secure
- Establishing enterprise scale data governance models including data… versioning, defining enterprise data model and building the enterprise data lake strategy.
- Develop guidelines for Airflow cluster and DAG’s.
- Performance tuning of the Airflow DAG’s and task implementation
- Adept in analyzing and refining requirements, consumption query patterns and choosing the right technology fit like NoSQL, RDBMS, Data Lake and Data Warehouse.
- Analyze and document source to target data mappings.
- Build scalable data models to support enterprise level reporting/analytics.