Trifacta

When a dataset is first created, a background job begins to generate a sample using the first set of rows of the dataset. This initial sample is usually very quick to generate, so that you can get to work right away on your transformations. By default, each sample is 10MB in size or the entire dataset if it's smaller.

Additional samples can be generated from the context panel on the right side of the Transformer page.  Sample jobs are independent executions.  As you develop your recipe, you might need to take new samples of the data. Through the Transformer page you can specify the type of sample that you wish to create and initiate the job to create the sample.  This sampling job runs in the background.

When a sample is executed from the Samples panel, it is launched based on the steps leading up to the current location in the recipe steps.  For example, if your recipe includes joining in other datasets, those steps are executed, and the sample is generated with dependencies on these other datasets. As a result, if you change your recipe steps that occur before the step where the sample was generated, you can invalidate your sample.<br>

- First rows/Initial Sample
- Random
- Filter-based
- Anomaly-based
- Stratified
- Cluster-based

Certain sample methodologies depend on the Sample Type.

There are two types of sampling: quick and full.  A quick scan looks at the first 2GB of data and creates samples from the limited set. A full scan samples from the entire dataset.

Refer to the following articles to learn more about Sampling:

<a href="http://help.trifacta.com/en/articles/5078863-sample-jobs-page" rel="nofollow noopener noreferrer" target="_blank">Sample Jobs Page</a>

<a href="http://help.trifacta.com/en/articles/4061253-generating-a-new-sample" rel="nofollow noopener noreferrer" target="_blank">Generating a new Sample</a>

<a href="http://help.trifacta.com/en/articles/4057929-managing-samples-in-complex-flows" rel="nofollow noopener noreferrer" target="_blank">Managing Samples in Complex flows</a>

<a href="http://help.trifacta.com/en/articles/5685771-how-to-edit-the-sample-size-in-the-transformer-grid" rel="nofollow noopener noreferrer" target="_blank">How to Edit Sample size in Transformer grid</a>

- <a href="http://help.trifacta.com/en/articles/5078863-sample-jobs-page" rel="nofollow noopener noreferrer" target="_blank">Sample Jobs Page</a>
- <a href="http://help.trifacta.com/en/articles/4061253-generating-a-new-sample" rel="nofollow noopener noreferrer" target="_blank">Generating a new Sample</a>
- <a href="http://help.trifacta.com/en/articles/4057929-managing-samples-in-complex-flows" rel="nofollow noopener noreferrer" target="_blank">Managing Samples in Complex flows</a>
- <a href="http://help.trifacta.com/en/articles/5685771-how-to-edit-the-sample-size-in-the-transformer-grid" rel="nofollow noopener noreferrer" target="_blank">How to Edit Sample size in Transformer grid</a>

More details about sampling can be found <a href="https://docs.trifacta.com/display/AWS/Sampling+Basics" rel="nofollow noopener noreferrer" target="_blank">in our product documentation</a>

- More details about sampling can be found <a href="https://docs.trifacta.com/display/AWS/Sampling+Basics" rel="nofollow noopener noreferrer" target="_blank">in our product documentation</a>

Trifacta allows you to quickly start working with your dataset by taking a quick sample, but sometimes you may need a little more.

A brief overview of sampling

Recipe logic and sampling

Sample Methodologies

Sample Type

More Information