Skip to main content
Dataset Schema Refresh

Learn how to gather fresh data in the Transformer page to capture changes to columns

Thea avatar
Written by Thea
Updated over 2 years ago

A schema refers to the sequence and data type of columns in a dataset. Schemas are applicable to relational tables and some file formats.

Schema refresh enables on-demand updating of your imported dataset schemas to capture changes to columns. For example, when you are working with datasets in a flow view, you can refresh your imported datasets' schemas by checking the source schema for changes. Schema refresh automatically generates a new initial sample, which allows you to gather fresh data on the Transformer page.

Schema refresh applies to:

  • Relational schemas

  • Schematized files

  • Flat Delimited files

  • PDF, Excel, and JSON

  • Google Sheets

Key Benefits:

  • Reduces the number of duplicate or invalid datasets created from the same source.

  • Reduces challenges of replacing datasets and retaking samples.

How to Refresh your Dataset Schema

You can Refresh your Dataset from the following places-

  1. From the Dataset Listing Page

a) From the left nav bar, Go to the Library

b) Click on the More(...) menu for the dataset of your choice, and select the 'Refresh Dataset' option

2. From the Dataset node Context menu in Flow View

Open the flow of your choice and right-click on the dataset node in the Flow View. You can select the 'Refresh Dataset' option from the context menu.

3. From the Dataset Details Page

a) Select the Dataset node in Flow View

b) In the Details Panel on the right, click on 'View dataset details'

c) In the Dataset details page that opens, click on the More(...) menu and select 'Refresh Dataset' option

What it Does

If available, this option refreshes the dataset's metadata with the latest source schema.

When you refresh the schema in the Trifacta application:

  • The source schema is applied to the imported dataset in all cases.

    • All the existing samples are invalidated.

    • A new initial sample is generated, which updates the previewed data. This may take some time.

  • The addition or removal of columns may cause recipe steps to break, which can cause any transformation jobs to fail. You must fix these broken steps in the Recipe panel.

More Info

Did this answer your question?