Senior/Lead Engineer - Data Engineering/Databricks
Multiple Cities, India
5 - 8 yrs
We are looking for an experienced Data Engineer with proven expertise in building and optimizing ETL pipelines on Databricks, leveraging Delta Lake and Spark SQL. The ideal candidate will have a strong foundation in Python and SQL, a solid understanding of data storage formats such as Parquet and Delta, and experience in performance optimization, testing, and automated workflows.
Locations
India - Trivandrum,
India - Cochin,
India - Calicut,
India - Koratty,
India - Chennai,
India - Bangalore,
India - Noida
Responsibilities
- ETL Development: Design and implement well-structured Databricks notebooks for ETL workflows, following best practices
- Data Storage: Utilize Delta Lake for data storage, demonstrating understanding of its benefits such as ACID transactions, schema enforcement, and time travel
- Data Transformation: Apply Spark SQL for complex data transformations and aggregations
- Delta Live Tables (DLT): Design and manage declarative, incremental pipelines on top of Delta Lake using Delta Live Tables. Leverage built-in orchestration, dependency management, and data quality checks for reliable ETL workflows
- File formats: Add explicit mention of Parquet, ORC, Avro, and JSON to ensure versatility in handling different formats
- Delta Sharing: Configure and manage Delta Sharing for secure, governed data distribution, integrating with Unity Catalog for access control, auditing, and automation as part of the data delivery process
- Data Governance: Leverage Unity Catalog for data lineage, tagging, and access control, enhancing data discoverability and ensuring compliance
- Error Handling & Validation: Implement proper exception handling, logging, and data validation checks to ensure data quality
- Automation: Develop automated triggers and job orchestration for pipeline execution
- Documentation: Maintain a comprehensive documentation explaining the project, dependencies, execution steps, and recommendations to stakeholders
- Test cases & Validation: Develop and maintain test cases to validate data transformations, schema consistency, and business rules, ensuring data accuracy and reliability across all pipeline stages
- Performance Optimization: Optimize ETL processes for scalability and reduced processing time
- Collaboration: Work closely with business analysts, data scientists, and stakeholders to deliver actionable insights
- Security best practices: Knowledge of encryption, masking, role-based access control in Databricks & cloud storage
Requirements
- 5+ years in Data Engineering, with strong expertise in Databricks, PySpark, and Python.
- Dynamic, Self-motivated engineer with extensive logical reasoning and problem solving skills
- Strong experience in Python and SQL, with extensive debugging skills
- Version control & DevOps: Git/GitHub/GitLab for versioning, integration with CI/CD
- Hands-on experience with Databricks and Delta Lake
- Solid understanding of Spark SQL and distributed computing concepts
- Experience in ETL design, data modeling, and pipeline automation
- Knowledge of error handling, logging, and data validation techniques
- Experience with unit testing and integration testing in data pipelines
- Proven track record in performance tuning of large-scale data processing jobs
- Strong problem-solving and analytical skills
- Excellent written and verbal communication skills
Preferred Skills
- Experience with cloud platforms (Azure, AWS, or GCP) in a data engineering context.
- Knowledge of data governance and compliance best practices.
Apply For Senior/Lead Engineer - Data Engineering/Databricks