Here is a concise summary of the key points from the Senior Data Engineer job posting at Commit:
1. Key Technical Requirements and Skills:
- 5+ years of data engineering experience
- Strong hands-on expertise with Apache Spark, including Structured Streaming
- Experience building both batch and streaming data pipelines in production
- Proficiency in designing AWS-based data lake architectures using S3, EMR, Glue, Athena
- Experience with event streaming platforms like Apache Kafka or Amazon Kinesis
- Knowledge of lakehouse formats like Delta Lake
- Proven track record in Spark performance tuning and cost optimization in AWS
2. Team/Project Information:
- Building a greenfield analytics platform supporting batch and real-time data processing
- Role combines hands-on development, architectural decision-making, and platform ownership
3. Unique/Notable aspects:
Description
We are building a greenfield analytics platform supporting both batch and real-time data processing. We are looking for a Senior Data Engineer who can design, implement, and evolve scalable data systems in AWS.
This role combines hands-on development, architectural decision-making, and platform ownership.
Core Responsibilities:
- Design and implement batch and streaming data pipelines using Apache Spark.
- Build and evolve a scalable AWS-based data lake architecture.
- Develop and maintain real-time data processing systems (event-driven pipelines).
- Own performance tuning and cost optimization of Spark workloads.
- Define best practices for data modeling, partitioning, and schema evolution.
- Implement monitoring, observability, and data quality controls.
- Contribute to infrastructure automation and CI/CD for data workflows.
- Participate in architectural decisions and mentor other engineers.
Requirements
Required Qualifications:
- 5+ years of experience in Data Engineering.
- Strong hands-on experience with Apache Spark (including Structured Streaming).
- Experience building both batch and streaming pipelines in production environments.
- Proven experience designing AWS-based data lake architectures: S3, EMR, Glue, Athena.
- Experience with event streaming platforms such as Apache Kafka or Amazon Kinesis.
- Experience implementing lakehouse formats such as Delta Lake.
- Strong understanding of partitioning strategies and schema evolution.
- Experience using SparkUI and AWS CloudWatch for profiling and optimization.
- Strong understanding of Spark performance tuning (shuffle, skew, memory, partitioning).
- Proven track record of cost optimization in AWS environments.
- Experience with Docker and CI/CD pipelines.
- Experience with Infrastructure as Code: Terraform, AWS CDK.
- Familiarity with monitoring and observability practices.
- Experience in the Financial domain.
- Experience running Spark workloads on Kubernetes.
- Experience implementing data quality frameworks or metadata/lineage systems.
- English - B2, Ukrainian- Native
Originally posted on Himalayas