Latest News

Top trends in web scraping: focus on ethics, data quality and the power of ML

Web scraping, a technology that allows automatic collection of public web data, has been developing very rapidly over the last few years, as increasingly more businesses are discovering its potential. A prominent industry conference OxyCon, taking place on 25-26th of August online, will focus on how the technology is changing and what factors shape its future.

“As web scraping technology is maturing, it is constantly adapting to the ever changing needs of the businesses that use it. From a technological viewpoint, it is getting more advanced with the help of artificial intelligence and machine learning. From a business point of view, there is an increased need for clear ethical standards to distinguish reliable web scraping providers in the market. With OxyCon, we aim to provide a comprehensive overview of what is happening in the industry and how we can all benefit from it”, – says Julius Černiauskas, CEO of Oxylabs, a leading proxy provider and organiser of OxyCon.

One of the most important developments in the industry is the increasing focus on ethics. Con Conlon, the Managing Director of Merit Data & Technology, who will be addressing this topic at OxyCon, claims that web scraping can do much to underpin transparent and efficient markets, but we have a duty to undertake this activity in a manner which is thoughtful, considerate, and carries minimal impact on data sources.

“At the conference we will look at how to draft and implement ethical data collection policies, what benefits this can bring, and how to manage conflicts with client demands, legal frameworks and ethical collection”, – Con says.

Another aspect that is often overlooked, but deserves more attention is the quality of web data. Allen O’Neill, the CTO of DataWorks and Microsoft Regional director will be putting the spotlight on this topic: “Great insights need great data. If your data isn’t of high enough quality, your insights are going to be poor, and they won’t be trustworthy. That’s why we need to talk about it”.

When it comes to technological advancements, artificial intelligence (AI) and machine learning (ML) are driving the latest trends. Recent innovations in web scraping are leveraging these technologies to overcome common challenges in the field.

Operations such as content classification, content extraction or CAPTCHA solving can become a lot more efficient with automation. Jurijus Gorskovas, Machine Learning engineer at Oxylabs will present practical applications for augmenting web scraping with ML.

Meanwhile Pujaa Rajan, Machine Learning Engineer at Stripe will share her extensive experience on developing ML infrastructure. Her presentation will touch on various aspects of the infrastructure from data collection, feature extraction, training and retraining, serving, and monitoring.

In total, 15 speakers will be sharing their experience and tips at the conference. The event will involve presentations, panel discussions and workshops.

OxyCon 2021 is an annual web scraping community gathering opportunity enabled by Oxylabs, a leading data gathering solutions provider. The free online event is intended for everyone interested in web scraping trends and best practices – web scraping professionals, developers, data scientists, analytics, business decision makers, students. Registration is available HERE.