The first step to wrangling is to identify the data that you want to transform. A dataset can be created for use in Trifacta from either one or multiple files or database tables. Datasets can be created either from within a flow, or independent of a flow from within the Library.
Trifacta can connect to and import data from a variety of sources, including local desktop, HDFS, cloud native storage, relational databases, data warehouses and more.
Hit Add Datasets in the Flow Canvas.
OR
Right-click anywhere on the Flow view and select the option Add dataset in the context menu.
A dialog will pop up for you to select Datasets to add to the flow. If you had previously done any work in Trifacta, recently used datasets will be shown here. For new users or to create a brand new dataset, click the Import Datasets on the bottom left.
You will be taken to the Import Dataset browser.
On your left, you will see a list of all available file systems and connections. By default, you will have the option to upload from your computer, and you should be connected with the native scalable storage of your Trifacta environment, such as HDFS, S3, ADLS, or GCS. Further connections can be configured and leveraged as data sources.
Example - Using local files
Click on Upload to import a dataset from your local machine. Trifacta supports CSV, TXT, JSON, Avro, Parquet, XLS, and PDF files for direct upload.
Click Choose a File to select the file, or simply drag and drop the file into the screen. Wait a little bit for the upload to complete. You can also upload multiple files at once.
Once the upload is complete, the file will be staged.
You can add a Description for the staged dataset. You can now click on Import & Add to Flow.
The dataset node now gets added to the Flow View
Example - Using native storage
Click on the native storage option.
Depending on your platform, this could be HDFS, S3, ADLS or GCS.
In the file system browser, folders will be shown in bold blue, while files have normal black text. Click the name of a folder to navigate into the folder. The search bar allows you to easily filter within the current folder. For large folders, you can paginate using the links to the right of the search bar. You can also navigate to specific folders by hitting the pencil icon on the breadcrumbs and specifying a destination.
For files, you can preview the contents by clicking on the eye icon next to the file name.
Once you’ve located the file that you want to import, click the + button to the left of the file name. It will stage the file on the right-hand panel.
Once a dataset is staged, you are shown a preview of the data, and you can edit the name, description or additional import settings for the dataset.
Note
You can also stage multiple files from multiple locations in 1 go.
You can also click the + button next to a folder. This will stage the entire folder and all files and subfolders as a single dataset.
Click Import and Add to Flow
The dataset node gets added to Flow View
You can now proceed to create Recipes.
More Info
Learn more about creating datasets and related options here: