Skip to main content

Nodes: Join

Y
Written by Yohan
Updated today

On Dataflow, the node Join allow multiple nodes to merge multiple data nodes into one bloc.

Settings

To learn about common nodes settings, please visit this article:
​Nodes: Main settings

Join type

Join nodes can take data from multiple sources and aggregate them in a single output. It's currently the only type of node that takes more than a single source.

Merge

Allows you to aggregate multiple data sources without performing joins between them. Product IDs must be unique across all sources.


​Example: I have several Google Merchant Center data sources and I want to get a complete dataset with all of them.

When using a merge join and the same product id is present in multiple sources, the behavior is undefined.

Left-join

Left-join allows you to add some complementary data on a primary feed.


​Example: I want to enrich my main feed with attributes generated using GenAI.

The primary source determines the output products of the node: if there are 10 products in the main branch, there will be 10 products in the output.
Data from secondary branches is merged into the products of the primary branch when the IDs match, otherwise, it is ignored.

Primary source (left-join)

Choose the source becoming the main feed during the left-join.

Join On (left-join)

On the primary feed side, the product is always identified by its product id. The other sources can have different types of ID that get resolved during the join operation.

  • id: this is the default ID type (lpoid) that will be used in most cases

  • offerID: join on the primary source offerIDs when the secondary source uses this type if identifier. This is common for external CSV files

Did this answer your question?