Skip to main content
Scrapper
Y
Written by Yohan Virassamy
Updated over a week ago

Context

There are many ways to retrieve data from a product. One of them is to use the scrapper. A tool who use a user-agent to retrieve informations from the source code of a website. With all settings set we can “scrap” each product page to etablish a list of caracteristics like the name, link, color, size, etc...

How to find the scrapper?

The scrapper section can be found on the instance settings, from the homepage

How to use the scrapper?

Creating a configuration

Context

The scrapper is defined by its configurations. Each configuration contains an information about an element of the website

A configuration is composed of 4 steps: Extract, Transform, Map, Serialize.

Add a configuration

To add a configuration, click on Add config. A new line will appear at the end of the list

Preconfigurations

Clicking on Preconfigurations will lead you to window where you can import configurations presets

Visual selector

Clicking on Visual selector will lead you to a product page of your instance. From there you can click on an element to navigate within your page, add directly a configuration from the selector or copy the path

Auto-configure

Clicking on Auto-configure, will let the scrapper suggests configurations to add them directly

Setup

When editing your configuration you need to fill each step to finalize your line

Extract

Defined by a selector and a getter you can retrieve any existing line from the website (code source)

Transform (optional)

By using a regex format you can alter the display of the line you extracted. You can for example decide to remove some characters or the beginning of the line

remove special characters at the beginning and the end of the line

Map

Use the mapper to name your line using an existing name or a custom name (custom attribute). Using an existing name will combine your line with the same name from the GMC to associate the information

Serialize

Choose the way you want to display your line. It can be a simple text, an JSON array, a JSON object or a JSON object array

Testing your configurations

You can test your configurations to display all the setup lines. By clicking on Test you can use an random product url to visualize your lines or use a link to check a specific product

Settings

To allow the scrapper to retrieve data from the website you need to setup the tool

Scrapper settings

This settings define an user agent for the scrapper and a static ip who needs to be whitelisted from the owner of the website. An option called browser mode can also be activated to affect the way of scrapping the page.

Scheduling settings

This settings enable the scrapper and define a frequency of "scrap" among all the products of the website

Be careful, high values of batch size and parallelism can lead to performance issues from the website

Debug

Two sets of buttons, Show log and Test batch allow the user to check if the scrapper works without issues

Show log

Displays the results of scrap on each product in real time

Test batch

Execute scrapping of random products

Did this answer your question?