Adding a New Univariate Distribution Type
Contents
Adding a New Univariate Distribution Type#
UQTestFuns is delivered with several univariate distributions that are used by the currently available test functions. In case a new univariate distribution type is required for a new test function, the distribution should be added first to the code base.
This guide will help you with adding a new distribution type to the code base. As a motivating example, let’s assume we want to add the uniform distribution to the code base.
where \(a\) and \(b\) are the parameters of the distribution. They correspond to the lower and upper bounds of the distribution, respectively.
Note
Strictly speaking, this is not necessary, because the uniform distribution is already in the code base. It’s a rather contrived example, we admit.
By implementing the distribution correctly, you’ll be able to create a univariate random variable with that distribution. For example:
>>> import uqtestfuns as uqtf
>>> my_var = uqtf.UnivDist(distribution="uniform", parameters=[3, 5])
Step 0: Putting things in the right place#
Univariate random variables in UQTestFuns are represented as instances of the UnivDist
class.
The supported distributions for the random variables are implemented in separate modules
located in the src/uqtestfuns/core/prob_input/univariate_distributions
directory.
Here’s how the directory look:
src/uqtestfuns/core/prob_input/univariate_distributions
├── __init__.py
├── beta.py <- An implementation of the Beta distribution
├── gumbel <- An implementation of the Gumbel (max.) distribution
├── ...
└── utils.py <- Sub-package utility functions
Pick a name for your new distribution and add it to this directory.
For our example, the name uniform.py
would be apt.
Step 1: Implementing the distribution#
In the distribution module, you need to implement several module-level variables and functions; they all need to have exactly the name and, for functions, signature as prescribed below.
Name |
Description |
---|---|
|
The name of the distribution (no whitespace, please) |
|
The required number of parameters |
|
Verification function |
|
Get the lower bound of the distribution |
|
Get the upper bound of the distribution |
|
Compute the PDF values on a set of input values |
|
Compute the CDF values on a set of input values |
|
Compute the ICDF values on a set of input values |
Required packages#
First, import the commonly used packages and internal functions:
import numpy as np
from .utils import postprocess_icdf, verify_param_nums
from ....global_settings import ARRAY_FLOAT
Module-level variables#
Two module-level variables must be defined in all caps: DISTRIBUTION_NAME
and NUM_PARAMS
.
We pick the obvious name for the uniform distribution and assign the required number of parameters (\(2\)) to NUM_PARAMS
:
DISTRIBUTION_NAME = "uniform" # lower case
NUM_PARAMS = 2
Verify parameters#
The verify_parameters()
function verifies the consistency of the given parameters.
In our example, the parameters of the uniform distribution are the lower and upper bound of the distribution.
Therefore, the first parameter cannot be equal to or larger than the second parameter.
An implementation of the verification function is as follows:
def verify_parameters(parameters: ARRAY_FLOAT) -> None:
"""Verify the parameters of a uniform distribution.
Parameters
----------
parameters : ARRAY_FLOAT
The parameters of the uniform distribution
(i.e., lower and upper bounds).
Returns
------
None
The function exits without any return value when nothing is wrong.
Raises
------
ValueError
If any of the parameter values are invalid
or the shapes are inconsistent.
"""
# Verify overall shape
verify_param_nums(parameters.size, NUM_PARAMS, DISTRIBUTION_NAME)
if parameters[0] >= parameters[1]:
raise ValueError(
f"The lower bound {parameters[0]} "
f"cannot be greater than the upper bound {parameters[1]}!"
)
Lower bound#
The lower()
function returns the lower bound of the distribution.
Even for distributions that are technically unbounded, this function must be defined for numerical consideration.
In our example, the uniform distribution has a straightforward lower bound; it’s the first parameter.
def lower(parameters: ARRAY_FLOAT) -> float:
"""Get the lower bound of a uniform distribution.
Parameters
----------
parameters : ARRAY_FLOAT
The parameters of a uniform distribution.
Returns
-------
float
The lower bound of the uniform distribution.
"""
lower_bound = float(parameters[0])
return lower_bound
Upper bound#
Similarly to the lower()
function, the upper()
function returns the upper bound of the distribution.
In our example, the function may read:
def upper(parameters: ARRAY_FLOAT) -> float:
"""Get the upper bound of a uniform distribution.
Parameters
----------
parameters : ARRAY_FLOAT
The parameters of a uniform distribution.
Returns
-------
float
The upper bound of the uniform distribution.
"""
upper_bound = float(parameters[1])
return upper_bound
Cumulative distribution function (CDF)#
The CDF is the function \(F_X:\mathcal{D}_X \subseteq \mathbb{R} \mapsto [0, 1]\). For the uniform distribution, this function reads:
It can be implemented as follows:
def cdf(
xx: ARRAY_FLOAT,
parameters: ARRAY_FLOAT,
lower_bound: float,
upper_bound: float,
) -> ARRAY_FLOAT:
"""Get the CDF values of a uniform distribution.
Parameters
----------
xx : ARRAY_FLOAT
Sample values (realizations) of a uniform distribution.
parameters : ARRAY_FLOAT
Parameters of the uniform distribution.
lower_bound : float
Lower bound of the uniform distribution
upper_bound : float
Upper bound of the uniform distribution.
Returns
-------
ARRAY_FLOAT
The CDF values of the uniform distribution.
Notes
-----
- The CDF values for sample values below the lower bound are set to 0.0,
and for sample values above the upper bound are set to 1.0.
"""
yy = np.empty(xx.shape)
idx_lower = xx < lower_bound
idx_upper = xx > upper_bound
idx_rest = np.logical_and(
np.logical_not(idx_lower), np.logical_not(idx_upper)
)
yy[idx_lower] = 0.0
yy[idx_upper] = 1.0
yy[idx_rest] = (xx[idx_rest] - parameters[0]) / np.diff(parameters)
return yy
Note that for input values below (resp. above) the function lower bound (resp. upper bound), the function returns \(0.0\) (resp. \(1.0\)).
Probability density function (PDF)#
The PDF is the function \(F_X:\mathcal{D}_X \subseteq \mathbb{R} \mapsto \mathbb{R}_{\geq 0}\). For the uniform distribution, the function reads:
and can be implemented as follows:
def pdf(
xx: ARRAY_FLOAT,
parameters: ARRAY_FLOAT,
lower_bound: float,
upper_bound: float,
) -> ARRAY_FLOAT:
"""Get the PDF values of a uniform distribution.
Parameters
----------
xx : ARRAY_FLOAT
Sample values (realizations) of a uniform distribution.
parameters : ARRAY_FLOAT
Parameters of the uniform distribution.
lower_bound: float
Lower bound of the uniform distribution.
upper_bound: float
Upper bound of the distribution.
Returns
-------
ARRAY_FLOAT
The PDF values of the uniform distribution.
Notes
-----
- The sample values ``xx`` themselves are not used in the computation of
density value (it is, after all, a constant),
but required nevertheless as the function is vectorized.
Given a vector input, the function should return the PDF values of the
same length as the input.
Moreover, this signature must be consistent with the other distributions.
- The values outside the bounds are set to 0.0.
"""
yy = np.zeros(xx.shape)
idx = np.logical_and(xx >= lower_bound, xx <= upper_bound)
yy[idx] = 1 / (np.diff(parameters))
return yy
Note that outside the function domain, the function values are zero.
Inverse cumulative distribution function (ICDF)#
Finally, the ICDF is the function \(F_X^{-1}: [0, 1] \mapsto \mathcal{D}_X\). For the uniform distribution, it reads:
and can be implemented as follows:
def icdf(
xx: ARRAY_FLOAT,
parameters: ARRAY_FLOAT,
lower_bound: float,
upper_bound: float,
) -> ARRAY_FLOAT:
"""Get the inverse CDF values of a uniform distribution.
Parameters
----------
xx : ARRAY_FLOAT
Sample values (realizations) in the [0, 1] domain.
parameters : ARRAY_FLOAT
Parameters of a uniform distribution.
lower_bound : float
Lower bound of the uniform distribution.
upper_bound : float
Upper bound of the uniform distribution.
This parameter is not used but must appear for interface consistency.
Returns
-------
np.ndarray
Transformed values in the domain of the uniform distribution.
Notes
-----
- ICDF for sample values outside [0.0, 1.0] is set to NaN.
"""
xx[xx < 0.0] = np.nan
xx[xx > 1.0] = np.nan
# Compute the ICDF
yy = lower_bound + np.diff(parameters) * xx
# Check if values are within the set bounds
yy = postprocess_icdf(yy, lower_bound, upper_bound)
return yy
Note that by convention, input values outside the function domain, that is, \(x \notin [0, 1]\), give invalid values.
Step 2: Registering the distribution#
Once you’ve done implementing the distribution, be sure to register it in the module so UQTestFuns can discover it.
Modify the file src/uqtestfuns/prob_input/utils.py
as follows:
from .univariate_distributions import (
...
uniform, # Import the new module here
)
...
SUPPORTED_MARGINALS = {
...
uniform.DISTRIBUTION_NAME: uniform, # register the module with the name as the keyword
}
This way you can create a new instance of UnivDist
class by passing the chosen name as the distribution
and the corresponding parameters:
>>> import uqtestfuns as uqtf
>>> my_var = uqtf.UnivDist(distribution="uniform", parameters=[3, 5])
Remember that the name uniform
was chosen as the name of this distribution via the module-level variable DISTRIBUTION_NAME
.
Step 3: Running the test#
UQTestFuns includes a test suite that tests the correctness of all registered univariate distributions.
In most cases, you don’t need to have a specific test for the distribution you created.
Therefore, run pytest
and make sure there’s nothing broken.
Congratulations! You now know how to add a univariate distribution to the code base!
Consider making the new distribution available to all by making a pull request to the main repository of UQTestFuns.