Senior Data Engineer
Before You Apply
- See if your CV survives Emaratech's ATS filters
- Get AI-rewritten bullet points
- Download Gulf-ready CV
60 seconds. $3.99 one-time.
About the Role
We are seeking a highly skilled and experienced Data Lake Cloud Engineer with a proven track record of designing, implementing, and maintaining large-scale cloud-based data lake platforms. This role requires a professional who can take ownership of our current data lake ecosystem, optimize its performance, and drive future enhancements with minimal oversight. The ideal candidate will have at least 5 years of hands-on experience in building enterprise-grade data lakes, strong cloud architecture expertise, and the ability to work with cutting-edge data ingestion, processing, and analytics tools.
Key Responsibilities
• Take ownership of the existing enterprise data lake platform, ensuring scalability, reliability, and performance.
• Lead the design, architecture, and implementation of cloud-native data lake solutions and integrations.
• Manage and optimize data ingestion pipelines on Oracle OCI, using tools such as Apache NiFi, Kafka, Batch Processing of data, Data captures, and or CSV.
• Design and implement pipelines for network data ingestion and file formats (e.g., Parquet, Avro, OCR, etc.), ensuring efficient storage, processing, and retrieval.
• Build, configure, and tune query engines such as Trino (Presto), Spark, and Hive for efficient analytics and reporting.
• Implement and maintain metadata management, data governance, and security frameworks.
• Monitor and troubleshoot system performance, ensuring SLAs are met for ingestion, processing, and query workloads.
• Automate platform deployment, monitoring, and maintenance with Infrastructure-as-Code (Terraform, CloudFormation, etc.).
• Collaborate with data engineers, analysts, and business teams to understand data requirements and deliver solutions that maximize data accessibility and usability.
• Keep the data platform up to date with the latest open-source and cloud-agnostic technologies, implementing upgrades and enhancements where needed.Requirements
5+ years of proven, hands-on experience implementing and managing large-scale data lakes in the cloud (OCI). Strong expertise in:
• Data ingestion & orchestration: Apache NiFi, Apache Kafka, CSV, and others
• Data processing frameworks: Apache Spark, PySpark, Trino (Presto), Hive, Flink.
• Storage & lakehouse architectures: Delta Lake, Apache Hudi, Iceberg, and cloud-native object storage (S3).
• Query & analytics tools: Trino/Presto, SparkSQL, Metabase, or Apache Superset.
• Experience with data lake file formats such as Apache Parquet, Avro, ORC, CSV, etc. including ingestion, parsing, and analytics within a data lake.
• Solid understanding of data governance, lineage, cataloging, and security frameworks (Apache Atlas).
• Experience with CI/CD and IaC (ArgoCD, Terraform, Ansible) for automated deployments.
• Hands-on experience with cloud security best practices, including IAM, encryption, and network security.
• Strong proficiency in Python or Java for data engineering and automation tasks.
• Proven ability to work independently, quickly understand existing environments, and deliver results without extensive training.Preferred Skills
• Exposure to machine learning workflows integrated with data lakes.
• Experience with real-time streaming data pipelines.
• Familiarity with containerization and orchestration (Docker, Kubernetes).
• Knowledge of cost optimization strategies in cloud-based data platforms.
Requirements
- •5+ years of proven, hands-on experience implementing and managing large-scale data lakes in the cloud (OCI)
- •Strong expertise in data ingestion orchestration (Apache NiFi, Kafka, CSV)
- •Strong expertise in data processing frameworks (Apache Spark, PySpark, Trino, Hive, Flink)
- •Experience with storage lakehouse architectures (Delta Lake, Apache Hudi, Iceberg)
- •Experience with query analytics tools (Trino/Presto, SparkSQL, Metabase)
- •Experience with data lake file formats (Parquet, Avro, ORC, CSV)
- •Solid understanding of data governance, lineage, cataloging, and security frameworks (Apache Atlas)
- •Strong proficiency in Python or Java
Nice to Have
- •Experience with CI/CD and IaC (ArgoCD, Terraform, Ansible)
- •Hands-on experience with cloud security best practices
- •Experience with data capture and batch processing
Responsibilities
- •Take ownership of the existing enterprise data lake platform, ensuring scalability, reliability, and performance
- •Lead the design, architecture, and implementation of cloud-native data lake solutions and integrations
- •Manage and optimize data ingestion pipelines on Oracle OCI
- •Design and implement pipelines for network data ingestion and file formats
- •Build, configure, and tune query engines (Trino, Spark, Hive)
- •Implement and maintain metadata management, data governance, and security frameworks
- •Monitor and troubleshoot system performance, ensuring SLAs are met
- •Automate platform deployment, monitoring, and maintenance with Infrastructure-as-Code (Terraform, CloudFormation)
Related Jobs
- See if your CV survives Emaratech's ATS filters
- Get AI-rewritten bullet points
- Download Gulf-ready CV
60 seconds. $3.99 one-time.
Emaratech provides technology solutions and services, primarily for the UAE market. They focus on developing and implementing digital transformation initiatives.
Visit WebsiteView all jobs