27 Minutes ago • 4 Years + • Data Analyst • DevOps
About the job
Summary
This role involves designing, developing, and optimizing data pipelines using PySpark within an AWS ecosystem for a US-based B2B marketplace company. Responsibilities include leveraging AWS services (S3, Glue, EMR, Lambda, Redshift) to build scalable data solutions, optimizing PySpark workflows for performance and cost-efficiency, collaborating with data scientists and analysts, ensuring data quality and integrity, implementing data governance and security best practices, and documenting technical processes. The ideal candidate will have 4+ years of experience in data engineering with a focus on PySpark and AWS, proficiency in Python, a solid understanding of distributed computing, and excellent communication skills.
Flexible working format (remote, office, or hybrid)
Competitive salary and benefits package
Personalized career growth
Professional development opportunities
Education reimbursement
Corporate events and team buildings
Not hearing back from companies?
Unlock the secrets to a successful job application and accelerate your journey to your next opportunity.
Role Overview: As a Data Engineer on our Development Team, you will design, develop, and optimize data pipelines within an AWS ecosystem for a US-based B2B marketplace company. Your expertise in PySpark will be instrumental in processing large-scale datasets, ensuring the reliability and performance of our data systems. You will collaborate with cross-functional teams, including data scientists and analysts, to deliver high-impact solutions that support business objectives.
Key Responsibilities:
Design, develop, and implement data pipelines using PySpark within AWS environments
Leverage AWS services such as S3, Glue, EMR, Lambda, and Redshift for building scalable data solutions
Optimize PySpark workflows for performance, reliability, and cost-efficiency
Collaborate with stakeholders to understand data requirements and translate them into technical solutions
Ensure data quality and integrity through robust testing and monitoring processes
Implement data governance, security, and compliance best practices in all development activities
Document technical designs, processes, and workflows to support ongoing maintenance and team knowledge sharing.
Requirements:
Bachelor’s degree in Computer Science, Engineering, or a related field
4 years+ of experience in data engineering, with a focus on building and optimizing data pipelines using PySpark
Strong experience with AWS services, including S3, Glue, Lambda, EMR, and Redshift
Proficiency in Python programming and familiarity with related frameworks and libraries
Solid understanding of distributed computing and experience with Apache Spark
Hands-on experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation) is a plus
Strong analytical and problem-solving skills, with attention to detail and a proactive approach to troubleshooting
Excellent communication and collaboration skills, with the ability to work in a dynamic, team-oriented environment
Upper-Intermediate level of English
Ukrainian language Advanced or higher.
We offer:
Flexible working format - remote, office-based or flexible
A competitive salary and good compensation package
Personalized career growth
Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
Active tech communities with regular knowledge sharing