Role Summary - In this role, you’ll be working with an amazingly passionate and talented team of engineers and data scientists who are working at the bleeding edge of data science and data Automation.
Responsibilities: Our web crawling team is very unique in the industry - while we have many “single-site” crawlers, our unique proposition and technical efforts are all geared towards building “generic” bots that can crawl and parse data from thousands of websites and documents, all using the same code. This requires a whole different level of thinking, planning, and coding. Here’s what you’ll do: Build, improve, and run our generic robots to extract data from both the web and documents – handling critical information among a wide variety of structures and formats without error.Craft highly scalable solutions to revolutionize our web crawling strategies. Derive common patterns from semi-structured data, build code to handle them, and be able to deal with exceptions as well. Be responsible for the live execution of our robots, managing turnaround times, exceptions, QA, and delivery, and building a bleeding-edge infrastructure to handle volume and scope. Responsible for end-to-end project automation using Python.
Requirements:A bachelor’s degree in Computer Science/Information Technology engineering is preferred. 2-3 years of experience in web crawling using Python. Must have expertise in scraping social media websites and a strong understanding of overcoming complex anti-crawling measures. Must have hands-on experience in Python libraries like Requests, Scrapy, Pandas, Urllib, or BeautifulSoup (BS4). Experience with API Development would be an added advantage. Must have experience in working with at least one standard RDBMS (PostgreSQL, SQLServer, etc.). Must have knowledge and exposure to AWS, Docker & Lambda. Must have created and handled fully automated End-to-end project pipelines using Python.Experience with web-based automation tools (Selenium, Puppeteer, Mechanise, Render) would be an added advantage
Other Infrastructure Requirements: Since this is a completely work-from-home position, you will also require the following - High-speed internet connectivity for video calls and efficient work. Capable business-grade computer (e.g., modern processor, 8 GB+ of RAM, and no other obstacles to interrupted efficient work). Headphones with clear audio quality. Stable power connection and backups in case of internet/power failure.