You can create a data quality rule using custom metrics to assess the data quality.
You can use the calculated metric type (derived metrics) as a data quality input type and create a metric-based data quality rule. For example, you can create a constraint that the sales quantity should be within a specific range.
To learn the basics of Data Quality Rules in Trifacta, read this article.
Metric-based rules are supported only for some metric types. For more information on the rule types that support metrics, see Data Quality Rules Reference.
Metric input types are supported for the following rules:
In Range
Greater Than
Less Than
Equals
Not Equals
In Set
Not In Set
Metric name | Description |
Average | The average column value. |
Count Distinct | The number of unique column values. |
Maximum | The maximum column value. |
Minimum | The minimum column value. |
Sum | The sum of column values. |
Standard Deviation | The sample standard deviation of column values. |
Variance | The sample variance of column values. |
Count | The number of rows. |
Correlation | The Pearson correlation coefficient between two columns. |
Z-Score | The distance from the mean, in units of standard deviations. |
Steps to Create a Metric based Data Quality Rule
Click on the Data Quality Rule icons on the top right corner of the Transformer.
2. Click on Add rule
3. In the list of available Data Quality Rule options, look for Column Values
4. Select an option -
For example, to build a data quality rule where the average Price has to be greater than 5, select Greater Than.
In the Rule Builder, select the Input Type as Average, the column Price and Minimum Value of 5.
5. Use Group By to group the data per certain categories.
6. Click on Add to add this as a Data Quality Rule
The metric-based data quality rule is added.
The new rule is displayed in the Data Quality Rules panel. In the data quality bar for the rule, the green bar indicates the row values that have passed the rule, and the red bar indicates the row values that failed.
Hover over either color to see the row counts and percentage.
Select either color to highlight the indicated rows in the data grid.
Additional options are available in the context menu for the rule. For more information, see Data Quality Rules Panel.
More Info
To learn more, read the following detailed documentation guides-