Skip to main content
All CollectionsConnecting and Exploring Data
Create a Dataset from a SQL statement
Create a Dataset from a SQL statement

Use custom SQL as a source

Thea avatar
Written by Thea
Updated over 2 years ago

When you are working with relational sources in Trifacta, you can write a custom SQL statement to use as a dataset. By default, when you import a table from a relational database, Trifacta generates a SELECT * statement. Custom SQL allows you to customize the query used to pull data from the source system. Custom SQL datasets are most frequently used to filter the records imported into Trifacta.

How to create a custom SQL dataset

1. Navigate to the Import Data page.

2. On the Import Data page, select the link in the left panel that represents the relational database that contains the table(s) that you want to import.

3. Click the Create Dataset with SQL button located on the right side of the Choose a table panel:

This will open an empty SQL query window.

4. Inside the Create Dataset with SQL window, you can type any SELECT statement supported by your source database. This query will be executed on the database itself, so ensure that you are writing valid syntax for your source system.

5. Once you have written your SQL statement, click Create Dataset to save your query:

6. After saving your SQL query, a preview of your data will appear in a card on the right side of the Import Data screen. By default, Trifacta names this dataset "SQL Dataset #". We recommend selecting the default name and giving the dataset a more descriptive name. This will allow you to easily understand what data your query returns.

7. Click Import in the bottom-right side of the screen to import your custom SQL dataset into Trifacta:

Editing a SQL Dataset

Once created and added to the flow, the owner can Edit the SQL Dataset from the context menu option 'Edit Custom SQL'

Shared Flows

If a flow is shared with Editor permissions, the collaborator can edit the SQL statements.

Flow editors can edit any custom SQL used to import datasets into the flow.

More Information

Did this answer your question?