Skip to content

Query Language

MPC provides a general way of computing a function by several parties potentially allowing any computation. However, not all the computations are feasible to use in practice due to performance considerations. The query language exposes carefully chosen functions for statistical analysis. More functions can be added to the language depending on use cases and after the detailed performance cost estimation.

A single analysis may include one or more calculations (queries). Each query consists of a filter expression and one or more functions that are applied to the input data after filtering.

Example

age salary
20 10000
18 12000
78 9900
67 15000

Compute the average salary for all employees older than 25

Filter:   age > 25            Result: 2/4 (2 of 4 rows)
Function: mean(salary)        Result: 12450

Info

This document uses variable to correspond to the name of a column in a dataset.

Types

Both filter expressions and functions are typed. The types of variables and constants must match, according to the filter operations, and expected function parameters.

The existing types include

  • boolean: Represents something that is true or false
    • Accepted true literals: true, positive, yes.
    • Accepted false literals: false, negative, no.
  • int: A whole number
    • E.g.: 1, 42, 198011
  • decimal: A decimal number.
    • Uses . as decimal separator.
    • E.g.: 6.41, 95.156, 1101.5
  • date: An ISO 8601 formatted date.
    • E.g.: 2025-04-22, 1966-12-03
  • list: An aggregate of one of the previous types.
    • Lists are written as comma-separated values in square brackets, e.g.: [1, 2, 5] is a list of integers.

Filter expressions

A filter expression consists of one or more comparison expressions connected with logical AND and OR operators. A filter expression can also be negated using logical NOT.

Each comparison expression consists of a variable (a column from the data set), a comparison operation and a value to compare the variable to.

Operations Symbols Accepted Operand Types Resulting type
Comparison <, >, <=, >=, ==, != int, decimal, date boolean
Logical & (AND), \|(OR),! (NOT) boolean boolean

Example

Find all employees with a salary between 10000 and 20000 or younger than 30.

(salary >= 10000 & salary <= 20000) | age < 30

Find all employees with a salary outside the interval between 10000 and 20000 (note the negation operation !)

!(salary >= 10000 & salary <= 20000)

An illegal filter, as the types do not match.

salary >= 2024-05-15

Functions

min(variable)

Compute minimum of all values of variable.

Arguments:

  • a variable corresponding to a column name
    • Must be of type int, decimal or date

Example: min(age) = 18, for the example dataset

Warning

The min function reveals the exact value of a single data point in the data set. Whether to use this function should be determined on a case-by-case basis.

max(variable)

Compute maximum of all values of variable.

Arguments:

  • a variable corresponding to a column name
    • Must be of type int, decimal or date

Example: max(age) = 78, for the example dataset

Warning

The max function reveals the exact value of a single data point in the data set. Whether to use this function should be determined on a case-by-case basis.

mean(variable)

Compute the sample mean (the average value).

Arguments:

  • a variable corresponding to a column name
    • Must be of type int or decimal

Example: mean(salary) = 11725, for the example dataset

variance(variable)

Compute the sample variance. The variance characterizes the average degree to which each point differs from the mean.

Arguments:

  • a variable corresponding to a column name
    • Must be of type int or decimal

Example: variance(age) ≈ 974.9167, for the example dataset

standardDeviation(variable)

Compute the sample standard deviation as the square root of the sample variance. The standard deviation characterizes how far values are spread out from the mean (average value).

Arguments:

  • a variable corresponding to a column name
    • Must be of type int or decimal

Example: standardDeviation(age) ≈ 31.2237, for the example dataset

sum(variable)

Compute the sum of all values of the given variable.

Arguments:

  • a variable corresponding to a column name
    • Must be of type int or decimal

Example: sum(salary) = 46900, for the example dataset

prevalence(filter)

Compute prevalence. Prevalence is the percentage of the dataset matching the filter criteria.

Arguments:

  • A filter expression, e.g. age > 70

Example: prevalence(age > 70) = 25%, for the example dataset

count()

Compute number of rows after applying a filter expression.

Arguments: none

Example: filtering the example dataset with age > 18 and then computing count() gives 3.

chiTest(variable, [NUM], [INT])

Compute Chi-square test.

Chi-square (χ²) test is used to test whether the observed data corresponds to some expected data. E.g. observations correspond to some theoretical probability distribution, or two observed variables are independent of each other (for example, the fact that people buy snacks or not is independent of the type of the movie they attend).

Arguments:

  • a variable corresponding to a column name
    • Must be of type int or decimal
  • a list of integers or a list of decimals, corresponding to histogram buckets
  • a list of integers corresponding to expected values for each bucket

histogram(variable, [INT])

Compute a one-dimensional histogram.

Arguments:

  • a variable corresponding to a column name
    • Must be of type int or decimal
  • a list of integers - soft (less-or-equal) - upper bounds of the histogram buckets.

Output: count of elements in each bucket; the last count corresponds to the elements larger than the last bound in the input list of bounds. Note that the count for any of the buckets is below k (the anonymity threshold), the whole result is <k, and it is not possible to see for which bucket the count was below the threshold.

Example: histogram(salary, [10000, 14000]) = [2, 1, 1], for the example dataset. The buckets: salary ≤ 10000, 10000 < salary ≤ 14000, salary > 14000. The histogram can be visualized in the following way:

Histogram Example

tTest(variable, NUM)

Compute the test statistics for a Student's t-test for the hypothesis that the mean (average) of the sample is equal to the given value. For example, one can test a hypothesis that, on average, protein bars have 20 grams of protein.

Arguments:

  • a variable corresponding to a column name
  • a number (possibly decimal) corresponding to the

Info

Lists are written as comma-separated numbers in square brackets, e.g: [1,2,3.6]