We are looking for a Lead/Senior Big Data Developer.
Required skills
— 5+ years of experience in data engineering, with a focus on big data technologies.— Advanced skills in Scala and/or Python.— Proficiency in Apache Spark/PySpark/Spark SQL DSL— Strong experience with AWS (EC2, IAM, S3, Glue, EMR)— Extensive knowledge of CI/CD processes and tools (Jenkins, GitHub Actions).— Proficiency in data processing pipelines/database design and architecture (Lambda, Kappa, Datalake/Lakehouse architecture)— Familiarity with Databricks (Jobs, SQL Warehouse, OverWatch, Unity Catalog).— Strong understanding of data streaming and real-time data processing.— Experience with message brokers (Kafka) and Change Data Capture (CDC) techniques.— Proficiency in SQL and NoSQL databases.— Level of English: Upper-Intermediate.
As a plus
— Experience with Apache Airflow, including writing plugins and custom operators.— Experience with Terraform— Strong grasp of AWS data platform services and their strengths/weaknesses— Strong experience using Jira, Slack, JetBrains IDEs, Git, GitLab, GitHub, Docker, Jenkins
Responsibilities
— Lead the design, development, and maintenance of scalable and efficient data pipelines.Conduct code reviews and create design documents for new features.— Provide technical support and mentorship to other teams within the organization.— Participate in management meetings to discuss priorities, scopes, deadlines, and cross-team dependencies.— Develop, optimize, and tune Apache Spark jobs (Scala, PySpark, Spark SQL).— Implement CI/CD processes using Jenkins and GitHub Actions.— Design and implement data processing pipelines using AWS and GCP services.— Work with Apache Airflow to manage and automate workflows, including custom plugin development.— Utilize Databricks for job scheduling, SQL warehousing, and data visualization.
We offer
— High compensation according to your technical skills;— Interesting projects with great Customers;— 5-day working week, 8-hour working day, flexible schedule;— Democratic management style & friendly environment;— Full remote;— Annual Paid vacation — 30 b/days + unpaid vacation;— Paid sick leaves — 6 b/days per year;— Corporate Perks (external training, English courses, corporate events/team buildings);— Professional and personal growth.
Project description
Client is an American e-book and audiobook subscription service that includes one million titles. Platform hosts 60 million documents on its open publishing platform.The platform allows: — anyone to share his/her ideas with the world;— access to audio books;— access to world's composers who publish their music;— incorporates articles from private publishers and world magazines;— allows access to exclusive content.Core Platform provides robust and foundational software, increasing operational excellence to scale apps and data. We are focused on building, testing, deploying apps and infrastructure which will help other teams rapidly scale, inter-operate, integrate with real-time data, and incorporate machine learning into their products. Working with our customers in the Data Science and Content Engineering, and our peers in Internal Tools and Infrastructure teams we bring systems-level visibility and focus to our projects.Client’s goal is not total architectural or design perfection, but rather choosing the right trade-offs to strike a balance between speed, quality and cost.