Custom Metric Syntax

In Aporia, custom metrics are defined using syntax that is similar to python's.

There are three building blocks which can be used in order to create a custom metric expression:

  • Constants - a numeric value (e.g. 2, 0.5, ..)

  • Functions - out of the builtin function collection you can find below (e.g. sum, count, ...). All those functions return a numeric value.

  • Binary operation - +, -, *, /, **. Operands can be both constants or function calls.

Builtin Functions

Before we dive into each of the supported functions, let's take a look at a few examples of custom metric definitions.

// Average annual premium of those with a driving license
sum(column="annual_premium") / count()

// Mean predicted probability
mean(column="proba")

// Model revenue
5 * tp_count(column="will_buy_insurance") -2 * fp_count(column="will_buy_insurance")

// nDCG@4 per step
ndcg_at_k(column="p_views", k=4)
ndcg_at_k(column="p_add_to_cart", k=4)
ndcg_at_k(column="p_purchases", k=4)

// accuracy using custom threshold
accuracy(column="proba", type="numeric", threshold=0.2)

// R-squared - Expanding brackets to use available aggregations
rss = squared_error_sum(column="prediction")
tss = squared_sum(column="actual") - 2*mean(column="actual")*sum(column="actual") + column_count(column="actual")*(mean(column="actual")**2)
1 - rss/tss

Filters within functions

Within Aporia we can always set a segment on our metrics as a whole, but sometimes this is just not enough. Many times we will need to pass a segment of our data to a specific function as part of our metric.

Aporia supports these cases by passing another argument to functions called "filter".

With the "filter" argument you'll be able to set any filtering to the data passed in the "column" argument using the custom segment syntax.

For example:

// Ratio of the annual premium of people above 70 out of the total premium
sum(column="annual_premium", filter="age > 70") / sum(column="annual_premium")

To allow you to set any of your segments upon these metrics as a whole as well, setting a filter within a metric will create behind the scenes, the intersection of the segment within the filter with all of your existing filters. These segments will be counted as any regular segment.

Supported functions

Numerical Measures

absolute_sum

Returns the sum of absolutes for the given column.

Parameters

  • column: the name of the field on which we want to apply the function. Can be numeric field of any group (feature / raw_input / prediction / actual)

count

Returns the total number of rows.

Parameters

No parameters needed.

column_count

Returns the number of rows with non-null values for the given column.

Parameters

  • column: the name of the field on which we want to apply the function. Can be any field.

max

Returns the maximum value for the given column.

Parameters

  • column: the name of the field on which we want to apply the function. Can be numeric field of any group (feature / raw_input / prediction / actual)

max_length

Returns the maximum length for the given column (items for arrays/embeddings, characters for text).

Parameters

  • column: the name of the field on which we want to apply the function. Can be text/array/numeric array/embedding field of any group (feature / raw_input / prediction / actual)

median

Returns the median value for the given column.

Parameters

  • column: the name of the field on which we want to apply the function. Can be numeric field of any group (feature / raw_input / prediction / actual)

mean

Returns the average value for the given column.

Parameters

  • column: the name of the field on which we want to apply the function. Can be numeric field of any group (feature / raw_input / prediction / actual)

mean_length

Returns the average length for the given column (items for arrays/embeddings, characters for text).

Parameters

  • column: the name of the field on which we want to apply the function. Can be text/array/numeric array/embedding field of any group (feature / raw_input / prediction / actual)

min

Returns the minimum value for the given column.

Parameters

  • column: the name of the field on which we want to apply the function. Can be numeric field of any group (feature / raw_input / prediction / actual)

min_length

Returns the minimum length for the given column (items for arrays/embeddings, characters for text).

Parameters

  • column: the name of the field on which we want to apply the function. Can be text/array/numeric array/embedding field of any group (feature / raw_input / prediction / actual)

missing_count

Returns the number of rows with null values for the given column.

Parameters

  • column: the name of the field on which we want to apply the function. Can be any field.

missing_ratio

Returns the percentage of rows with null values for the given column.

Parameters

  • column: the name of the field on which we want to apply the function. Can be any field.

sum

Returns the sum for the given column.

Parameters

  • column: the name of the field on which we want to apply the function. Can be numeric field of any group (feature / raw_input / prediction / actual)

squared_sum

Returns the sum of squared values for the given column.

Parameters

  • column: the name of the field on which we want to apply the function. Can be numeric field of any group (feature / raw_input / prediction / actual)

squared_deviation_sum

Returns the sum of squares for the given column.

For column x, with m mean of all x samples, equals to sum of (x-m)².

Parameters

  • column: the name of the field on which we want to apply the function. Can be numeric field of any group (feature / raw_input / prediction / actual)

value_count

Returns the number of entries where the given column is equal to the given value.

For example, value_count(column="bool", value=True) will return count of entries where bool=TRUE.

Parameters

  • column: the name of the field on which we want to apply the function. Can be any boolean/categorical field.

  • value: The value of the field to look for.

variance

Returns the variance for the given column.

Parameters

  • column: the name of the field on which we want to apply the function. Can be numeric field of any group (feature / raw_input / prediction / actual)

Regression Metrics

absolute_error_sum

Returns the sum of absolute errors for the given prediction.

For a prediction P and actual A, returns the sum of |P-A|.

Parameters

  • column: the name of the numeric prediction field on which we want to apply the function. Must have a numeric actual mapped to it.

mae

Calculates MAE for the given prediction.

Parameters

  • column: the name of the numeric prediction field on which we want to apply the function. Must have a numeric actual mapped to it.

mse

Calculates MSE for the given prediction.

Parameters

  • column: the name of the numeric prediction field on which we want to apply the function. Must have a numeric actual mapped to it.

rmse

Calculates RMSE for the given prediction.

Parameters

  • column: the name of the numeric prediction field on which we want to apply the function. Must have a numeric actual mapped to it.

squared_error_sum

Returns the sum of squared errors for the given prediction.

For a prediction P and actual A, returns the sum of (P-A)².

Parameters

  • column: the name of the numeric prediction field on which we want to apply the function. Must have a numeric actual mapped to it.

Binary Classification Metrics

accuracy

Calculates accuracy for the given prediction.

Parameters

  • column: the name of the numeric/boolean prediction field on which we want to apply the function. Must have a boolean actual mapped to it.

  • threshold: probability threshold according to which we decide if a class is positive. Required for numeric predictions.

  • method: will define the average strategy to use. Can be: "macro", "micro" or "weighted". Required for categorical predictions.

auc_roc

Calculates AUC ROC for the given prediction.

Parameters

  • column: the name of the numeric prediction field on which we want to apply the function. Must have a boolean actual mapped to it.

fn_count

Returns the number of False-Negative results.

Parameters

  • column: the name of the numeric/boolean prediction field on which we want to apply the function. Must have a boolean actual mapped to it.

  • threshold: probability threshold according to which we decide if a class is positive. Required for numeric predictions.

fp_count

Returns the number of False-Positive results.

Parameters

  • column: the name of the numeric/boolean prediction field on which we want to apply the function. Must have a boolean actual mapped to it.

  • threshold: probability threshold according to which we decide if a class is positive. Required for numeric predictions.

f1

Calculates f1-score for the given prediction.

Parameters

  • column: the name of the numeric/boolean prediction field on which we want to apply the function. Must have a boolean actual mapped to it.

  • threshold: probability threshold according to which we decide if a class is positive. Required for numeric predictions.

  • method: will define the average strategy to use. Can be: "macro", "micro" or "weighted". Required for categorical predictions.

precision

Calculates precision for the given prediction.

Parameters

  • column: the name of the numeric/boolean prediction field on which we want to apply the function. Must have a boolean actual mapped to it.

  • threshold: probability threshold according to which we decide if a class is positive. Required for numeric predictions.

  • method: will define the average strategy to use. Can be: "macro", "micro" or "weighted". Required for categorical predictions.

recall

Calculates recall for the given prediction.

Parameters

  • column: the name of the numeric/boolean prediction field on which we want to apply the function. Must have a boolean actual mapped to it.

  • threshold: probability threshold according to which we decide if a class is positive. Required for numeric predictions.

  • method: will define the average strategy to use. Can be: "macro", "micro" or "weighted". Required for categorical predictions.

tn_count

Returns the number of True-Negative results.

Parameters

  • column: the name of the numeric/boolean prediction field on which we want to apply the function. Must have a boolean actual mapped to it.

  • threshold: probability threshold according to which we decide if a class is positive. Required for numeric predictions.

tp_count

Returns the number of True-Positive results.

Parameters

  • column: the name of the numeric/boolean prediction field on which we want to apply the function. Must have a boolean actual mapped to it.

  • threshold: probability threshold according to which we decide if a class is positive. Required for numeric predictions.

Multiclass Classification Metrics

accuracy_per_class

Calculates accuracy for the given prediction per the specified category class.

Parameters

  • column: the name of the categorical prediction field on which we want to apply the function. Must have a categorical actual mapped to it.

  • class_name: the class on which we want to calculate the function.

fn_count_per_class

Returns the number of False-Negative results per the specified category class.

Parameters

  • column: the name of the categorical prediction field on which we want to apply the function. Must have a categorical actual mapped to it.

  • class_name: the class on which we want to calculate the function.

fp_count_per_class

Returns the number of False-Positive results per the specified category class.

Parameters

  • column: the name of the categorical prediction field on which we want to apply the function. Must have a categorical actual mapped to it.

  • class_name: the class on which we want to calculate the function.

f1_per_class

Calculates f1-score for the given prediction per the specified category class.

Parameters

  • column: the name of the categorical prediction field on which we want to apply the function. Must have a categorical actual mapped to it.

  • class_name: the class on which we want to calculate the function.

precision_per_class

Calculates precision for the given prediction per the specified category class.

Parameters

  • column: the name of the categorical prediction field on which we want to apply the function. Must have a categorical actual mapped to it.

  • class_name: the class on which we want to calculate the function.

recall_per_class

Calculates recall for the given prediction per the specified category class.

Parameters

  • column: the name of the categorical prediction field on which we want to apply the function. Must have a categorical actual mapped to it.

  • class_name: the class on which we want to calculate the function.

tn_count_per_class

Returns the number of True-Negative results per the specified category class.

Parameters

  • column: the name of the categorical prediction field on which we want to apply the function. Must have a categorical actual mapped to it.

  • class_name: the class on which we want to calculate the function.

tp_count_per_class

Returns the number of True-Positive results per the specified category class.

Parameters

  • column: the name of the categorical prediction field on which we want to apply the function. Must have a categorical actual mapped to it.

  • class_name: the class on which we want to calculate the function.

Ranking Metrics

accuracy_at_k

Calculates Accuracy for the given prediction on the top K items.

Parameters

  • column: the name of the array prediction field on which we want to apply the function. Must have an array actual mapped to it. If using candidate-level ranking, can be a boolean prediction with a mapped boolean actual.

  • k: numeric integer between 1 to 12. Only the top-k items will be considered.

map_at_k

Calculates MAP (Mean-Average-Precision) for the given prediction on the top K items.

Parameters

  • column: the name of the array prediction field on which we want to apply the function. Must have an array actual mapped to it. If using candidate-level ranking, can be a boolean prediction with a mapped boolean actual.

  • k: numeric integer between 1 to 12. Only the top-k items will be considered.

mrr_at_k

Calculates MRR (Mean-Reciprocal-Rank) for the given prediction on the top K items.

Parameters

  • column: the name of the array prediction field on which we want to apply the function. Must have an array actual mapped to it. If using candidate-level ranking, can be a boolean prediction with a mapped boolean actual.

  • k: numeric integer between 1 to 12. Only the top-k items will be considered.

ndcg_at_k

Calculates NDCG for the given prediction on the top K items.

Parameters

  • column: the name of the array prediction field on which we want to apply the function. Must have an array actual mapped to it. If using candidate-level ranking, can be a boolean prediction with a mapped boolean actual.

  • k: numeric integer between 1 to 12. Only the top-k items will be considered.

precision_at_k

Calculates Precision for the given prediction on the top K items.

Parameters

  • column: the name of the array prediction field on which we want to apply the function. Must have an array actual mapped to it. If using candidate-level ranking, can be a boolean prediction with a mapped boolean actual.

  • k: numeric integer between 1 to 12. Only the top-k items will be considered.

recall_at_k

Calculates Recall for the given prediction on the top K items.

Parameters

  • column: the name of the array prediction field on which we want to apply the function. Must have an array actual mapped to it. If using candidate-level ranking, can be a boolean prediction with a mapped boolean actual.

  • k: numeric integer between 1 to 12. Only the top-k items will be considered.

Last updated