Data Engineer - Python & Databricks

1 Month ago • 5 Years + • Data Analyst

About the job

Job Description

As a Data Engineer Developer, you will design, develop, and maintain data pipelines using Python and Databricks to process large-scale datasets. You'll collaborate with data scientists, analysts, and stakeholders to gather requirements and build efficient, scalable solutions for advanced analytics and reporting. Responsibilities include data pipeline development (batch and real-time), ETL process creation and maintenance, data integration from various sources, collaboration with cross-functional teams, performance optimization, data validation, cloud platform integration (AWS, Azure, or Google Cloud), pipeline automation and scheduling, and comprehensive documentation. The role requires expertise in Python, Databricks, and big data technologies, along with strong data modeling and warehousing skills.
Must have:
  • 5+ years Data Engineering experience with Python expertise
  • Databricks or similar big data platform experience
  • Strong understanding of data pipelines, ETL, and data integration
  • Cloud platform experience (AWS, Azure, or GCP)
  • Proficiency in SQL and relational/non-relational databases
  • Experience with big data technologies (Spark, Kafka, Hadoop)
  • Data modeling, warehousing, and database design knowledge
  • Experience with Git and CI/CD pipelines
Good to have:
  • Delta Lake, Lakehouse architecture experience
  • Machine learning and data science workflow familiarity
  • DevOps or DataOps experience
  • Terraform, Docker, or Kubernetes knowledge
  • Data governance and privacy regulation knowledge
Project description

As a Data Engineer Developer, you will design, develop, and maintain data pipelines using Python and Databricks to process large-scale data sets. You will collaborate with data scientists, analysts, and business stakeholders to gather data requirements and build efficient, scalable solutions that enable advanced analytics and reporting.

Responsibilities

Data Pipeline Development: Design, develop, and implement scalable data pipelines using Python and Databricks for batch and real-time data processing.

ETL Processes: Build and maintain ETL (Extract, Transform, Load) processes to gather, transform, and store data from multiple sources.

Data Integration: Integrate structured and unstructured data from various internal and external sources into data lakes or warehouses, ensuring data accuracy and quality.

Collaboration: Work closely with data scientists, analysts, and business teams to understand data needs and deliver efficient solutions.

Performance Optimization: Optimize the performance of data pipelines and workflows to ensure efficient processing of large data sets.

Data Validation: Implement data validation and monitoring mechanisms to ensure data quality, consistency, and reliability.

Cloud Integration: Work with cloud platforms like AWS, Azure, or Google Cloud to build and maintain data storage and processing infrastructure.

Automation & Scheduling: Automate data pipelines and implement scheduling mechanisms to ensure timely and reliable data delivery.

Documentation: Maintain comprehensive documentation for data pipelines, processes, and best practices.

Skills

Must have

5+ years of experience as a Data Engineer with strong expertise in Python.

Bachelor's degree in Computer Science, Data Engineering, or a related field (or equivalent experience).

Hands-on experience with Databricks or similar big data platforms.

Strong understanding of data pipelines, ETL processes, and data integration techniques.

Experience with cloud-based platforms such as AWS, Azure, or Google Cloud, particularly with services like Data Lakes, S3, or Azure Blob Storage.

Proficiency in SQL and experience with relational and non-relational databases.

Familiarity with big data technologies like Apache Spark, Kafka, or Hadoop.

Strong understanding of data modeling, data warehousing, and database design principles.

Ability to work with large, complex datasets, ensuring data integrity and performance optimization.

Experience with version control tools like Git and CI/CD pipelines for data engineering.

Excellent problem-solving skills, attention to detail, and the ability to work in a collaborative environment.

Nice to have

Experience with Delta Lake, Lakehouse architecture, or other modern data storage solutions.

Familiarity with machine learning and data science workflows.

Experience with DevOps or DataOps practices.

Knowledge of Terraform, Docker, or Kubernetes for cloud infrastructure automation.

Familiarity with data governance, data privacy regulations (e.g., GDPR, CCPA), and data security best practices.

Other

Languages

English: B2 Upper Intermediate

Seniority

Regular

View Full Job Description

Add your resume

80%

Upload your resume, increase your shortlisting chances by 80%

About The Company

Luxoft, a DXC Technology Company (NYSE: DXC), is a digital strategy and software engineering firm providing bespoke technology solutions that drive business change for customers the world over. Acquired by U.S. company DXC Technology in 2019, Luxoft is a global operation in 44 cities and 21 countries with an international, agile workforce of nearly 18,000 people. It combines a unique blend of engineering excellence and deep industry expertise, helping over 425 global clients innovate in the areas of automotive, financial services, travel and hospitality, healthcare, life sciences, media and telecommunications.

DXC Technology is a leading Fortune 500 IT services company which helps global companies run their mission critical systems. Together, DXC and Luxoft offer a differentiated customer-value proposition for digital transformation by combining Luxoft’s front-end digital capabilities with DXC’s expertise in IT modernization and integration. Follow our profile for regular updates and insights into technology and business needs.

Gothenburg, Västra Götaland County, Sweden (On-Site)

United States (Remote)

New Delhi, Delhi, India (Remote)

Poland, Ohio, United States (Remote)

Ukrainka, Kyiv Oblast, Ukraine (Remote)

View All Jobs

Get notified when new jobs are added by Luxoft

Similar Jobs

Iksha Labs - Senior C++ Engineer

Iksha Labs, India (On-Site)

WebPT - Lead, DevOps Engineer

WebPT, India (Hybrid)

The Mill Adventure - BI Analyst

The Mill Adventure, Malta (Remote)

The Walt Disney Company - Analyst, Marketplace & Platform Research

The Walt Disney Company, United States (On-Site)

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Trimble  Inc  - Lead DevOps Engineer

Trimble Inc , India (On-Site)

 Sagecor Solutions - Software Engineer 2 (IDN - 091)

Sagecor Solutions, United States (On-Site)

Evernorth Health Services - Software Engineering Advisor [T500-13630]

Evernorth Health Services, India (On-Site)

Rackspace Technology - Senior AWS DevOps Engineer

Rackspace Technology, Poland (Remote)

Saviynt - Engineer/Sr. Engineer, CloudOps

Saviynt, India (Hybrid)

WebFX - Entry Level Software Engineer

WebFX, United States (On-Site)

Warner Bros Games - Advanced Software Engineer

Warner Bros Games, United States (Hybrid)

Get notifed when new similar jobs are uploaded

Jobs in Gurugram, Haryana, India

Get notifed when new similar jobs are uploaded

Data Analyst Jobs

The Walt Disney Company - Analyst/Senior Analyst, Analytics

The Walt Disney Company, Singapore (On-Site)

Varonis  - FP&A Analyst

Varonis , United States (On-Site)

Google - Data Scientist, gTech Ads

Google, (On-Site)

SoundHound AI - Language Specialist, Telugu [Contractor]

SoundHound AI, India (Hybrid)

PublicisGroupe - Data Analyst - PGD-20279

PublicisGroupe, (Remote)

Animoca Brands - Data Analyst

Animoca Brands, Hong Kong (On-Site)

Netflix - Analytics Lead

Netflix, United States (On-Site)

Get notifed when new similar jobs are uploaded