PySpark Developer with Data BricksRemote
Job Description ::
·Develop and optimize data processing jobs using PySpark to handle complex datatransformations and aggregations efficiently.·Design and implement robust data pipelines on the AWS platform, ensuring scalabilityand efficiency(Databricks exposure will be an advantage)·Leverage AWS services such as EC2, S3, etc. for comprehensive data processing andstorage solutions.·Expertly manage SQL database schema design, query optimization, and performancetuning to support data transformation and loading processes.·Design and maintain scalable and performant data warehouses, employing best practicesin data modeling and ETL processes.·Utilize modern data platforms for collaborative data science, integrating seamlessly withvarious data sources and types.·Ensure high data quality and accessibility by maintaining optimal performance ofDatabricks clusters and Spark jobs.·Develop and implement security measures, backup procedures, and disaster recoveryplans using AWS best practices.·Manage source code and automate deployment using GitHub along with CI/CD practicestailored for data operations in cloud environments.·Provide expertise in troubleshooting and optimizing PySpark scripts, Databricksnotebooks, SQL queries, and Airflow DAGs.·Keep abreast of latest developments in cloud data technologies and advocate for theadoption of new tools and practices that can benefit the team.·Use Apache Airflow to orchestrate and automate data workflows, ensuring timely andreliable execution of data jobs across various data sources and systems.·Collaborate closely with data scientists and business analysts to design data models andpipelines that support advanced analytics and machine learning projects.Qualifications:·Bachelor’s or Master’s degree in Computer Science, Engineering, InformationTechnology, or related field.·Minimum of 5 years of experience as a Data Engineer with extensive expertise in AWS,and PySpark.·Deep knowledge of SQL and experience with data warehouse design and optimization.·Strong understanding of AWS services and how they integrate with Databricks and otherdata engineering tools.·Demonstrated ability to design, build, and maintain end-to-end data pipelines.·Excellent problem-solving abilities, with a track record of implementing complex datasolutions. Nice to Have ::·Experience in managing and automating workflows using Apache Airflow.·Familiarity with Python, Snowflake, and CI/CD processes using GitHub.·Strong communication skills for effective collaboration across technical teams andstakeholders.