Skip to main content
How to Union data

Append data from one or more datasets to an existing dataset

Thea avatar
Written by Thea
Updated over 3 years ago

In a Union operation, rows of data from multiple datasets are combined into a single dataset. For example, if you have multiple datasets containing transactional data, such as log files, you can use the union operation to join daily or weekly slices of this data into a single dataset.

The Trifacta application provides an easy-to-use tool to help create unions, allowing you to match datasets by name or position. You can also perform manual tweaks to the matches and decide which columns to include or exclude in the resulting dataset.


Steps

  1. In the Search panel, enter union in the text box.

Or Select Union Icon from the Transformer Toolbar on top

2. In the Union Page that opens, click Add data to bring in additional datasets that you want to combine with the current dataset.

3. Select one or multiple datasets to union from the list of imported datasets and click Apply.

4. You can choose how you want columns to be aligned during the union.

Using Auto Align - Trifacta SaaS performs intelligent mapping of the columns of the new dataset(s) to the dataset already loaded in the Transformer page.

  • Add Datasets and Align by Name. Matches are made based on the name of each column. Partial matches might be identified as matches, as well.

  • Add Datasets and Align by Position. Matches are made based on horizontal position of each column in each dataset. Extra columns will be dropped. This method might be useful if column names have changed between datasets.

5. Continue to Add data, Edit or drop columns and align data as needed.

  • To remove data from the union, click the X next to its name in the right panel.You cannot remove the original dataset from which the Union page was opened.

6. Output Panel

In the left panel, you can review and modify the columns to be included in and excluded from the output.

All columns from the original dataset are included in the output by default

You can see the columns that are sources for the union output column on the same line in the right panel.

  • By default, all matching columns are included in the output

  • To the right of the column name, you can see the number of datasets in the union where the column occurs.

7. Column Actions & Alignment

  • To add a column to the union output, click the + icon next to the left of the column entry in the lower panel

  • To review the top five values for any column, click the Expand icon.

Once satisfied, you can click on Add to Recipe

8. The Union step gets added to the recipe.

To modify a union after it has been created, click the Edit icon for the entry in the Recipe panel. See Recipe Panel.

More Info

To learn more about the Union operation, read this detailed documentation guide.

Did this answer your question?