Context
There are many ways to retrieve data from a product. One of them is to use the scrapper. A tool who use a user-agent to retrieve informations from the source code of a website. With all settings set we can “scrap” each product page to etablish a list of caracteristics like the name, link, color, size, etc...
How to find the scrapper?
The scrapper section can be found on the instance settings, from the homepage
How to use the scrapper?
Creating a configuration
Context
The scrapper is defined by its configurations. Each configuration contains an information about an element of the website
A configuration is composed of 4 steps: Extract, Transform, Map, Serialize.
Add a configuration
To add a configuration, click on Add config. A new line will appear at the end of the list
Preconfigurations
Clicking on Preconfigurations will lead you to window where you can import configurations presets
Visual selector
Clicking on Visual selector will lead you to a product page of your instance. From there you can click on an element to navigate within your page, add directly a configuration from the selector or copy the path
Auto-configure
Clicking on Auto-configure, will let the scrapper suggests configurations to add them directly
Setup
When editing your configuration you need to fill each step to finalize your line
Extract
Defined by a selector and a getter you can retrieve any existing line from the website (code source)
Transform (optional)
By using a regex format you can alter the display of the line you extracted. You can for example decide to remove some characters or the beginning of the line
remove special characters at the beginning and the end of the line
Map
Use the mapper to name your line using an existing name or a custom name (custom attribute). Using an existing name will combine your line with the same name from the GMC to associate the information
Serialize
Choose the way you want to display your line. It can be a simple text, an JSON array, a JSON object or a JSON object array
Testing your configurations
You can test your configurations to display all the setup lines. By clicking on Test you can use an random product url to visualize your lines or use a link to check a specific product
Settings
To allow the scrapper to retrieve data from the website you need to setup the tool
Scrapper settings
This settings define an user agent for the scrapper and a static ip who needs to be whitelisted from the owner of the website. An option called browser mode can also be activated to affect the way of scrapping the page.
Scheduling settings
This settings enable the scrapper and define a frequency of "scrap" among all the products of the website
Be careful, high values of batch size and parallelism can lead to performance issues from the website
Debug
Two sets of buttons, Show log and Test batch allow the user to check if the scrapper works without issues
Show log
Displays the results of scrap on each product in real time
Test batch
Execute scrapping of random products