Responsibilities:
- Develop and maintain Python scripts for web scraping and data extraction from diverse sources such as websites, APIs, and other online platforms.
- Utilize Python libraries and frameworks (e.g., Beautiful Soup, Scrapy, Selenium) to automate data collection tasks efficiently.
- Understand and analyze target websites or data sources to identify the best scraping approach and develop efficient scraping strategies.
- Build robust and scalable data scraping systems that can handle large volumes of data while ensuring data quality and integrity.
- Collaborate with data engineering and analytics teams to define data requirements, data structures, and storage mechanisms for scraped data.
- Should have ability to understand the LLM ML models.
- Perform data cleaning, preprocessing, and transformation tasks to prepare scraped data for downstream analysis and usage.
- Monitor and troubleshoot scraping processes to identify and resolve issues such as website changes, data format variations, and anti-scraping measures.
- Stay up-to-date with the latest web scraping trends, tools, and techniques to continually improve the efficiency and effectiveness of data scraping processes.
- Ensure compliance with legal and ethical standards when collecting and utilizing data from online sources.
Requirements:
- Strong experience in Python programming with expertise in web scraping and data extraction.
- In-depth knowledge of Python libraries and frameworks commonly used for web scraping, such as Beautiful Soup, Scrapy, Selenium, and Requests.
- Familiarity with HTML, CSS, XPath, and regular expressions for effective parsing and extraction of data from websites.
- Understanding of HTTP protocols and web technologies to handle various website structures and handle different data formats (e.g., JSON, XML, CSV).
- Experience with database systems (e.g., SQL, NoSQL) and data storage mechanisms for efficiently storing and managing scraped data.
- Ability to analyze and interpret web page structures, inspect network requests, and troubleshoot scraping issues.
- Strong problem-solving skills with attention to detail and ability to handle complex scraping scenarios.
- Experience in Captcha breaking and worked on Proxy for rotation of IPs
- Excellent communication and collaboration skills to work effectively with cross-functional teams.
- Proven ability to work independently, manage multiple scraping projects simultaneously, and meet deadlines.
Preferred Qualifications:
- Previous experience in scraping data from diverse domains and sources, including e-commerce websites, social media platforms, and news sites.
- Knowledge of data analysis and visualization tools (e.g., Pandas, NumPy, Matplotlib, Tableau) to perform exploratory data analysis and present insights.
- Familiarity with APIs and data integration techniques to combine scraped data with other data sources.
- Understanding of web scraping legalities, ethical considerations, and best practices.
- Join our team and contribute to our data-driven decision-making processes by leveraging your expertise in Python data scraping and extraction. Apply now and help us gather valuable insights from the vast web landscape.