numpy_onlinestats.NpOnlineStats#

class numpy_onlinestats.NpOnlineStats(*args, **kwargs)#

Streaming statistics for numpy arrays.

This class accumulates element-wise statistics for Numpy arrays in an online fashion: The arrays themselves are not stored in memory. This enables calculation of (appriximate) statistics for very large collections of arrays. Quantiles are approximated using the t-digest algorithm and its implementation. Moments are implemented using a numerically stable algorithm.

References:

Dunning and Ertl, 2019 (arXiv:1902.04023)

Args:

arr: First numpy array. Optional, if missing the accumulator will be initialized with the first call to add. size: Size of the t-digest buffer. Also used to determine the compression factor.

Note:

All following arrays must have the same shape. The data type of the internal state is determined by arr.dtype, which may or may not be what you want. Pass arr.astype(np.float64) for best results.

Attributes table#

dtype

The dtype of the accumulator.

nacc

The number of samples accumulated.

ndim

The number of dimensions of the accumulator.

shape

The shape of the accumulator.

size

The size of the accumulator.

strides

The strides of the accumulator.

Methods table#

add(self, arr)

Add an array to the accumulator.

cdf(...)

Overloaded function.

kurtosis(self)

Calculate the element-wise kurtosis of all seen arrays.

max(self)

Get the element-wise maximum of all seen arrays.

mean(self)

Calculate the element-wise mean of all seen arrays.

min(self)

Get the element-wise minimum of all seen arrays.

quantile(self, q)

Calculate an approximate quantile based on the current state.

reset(self)

Reset the accumulator.

skewness(self)

Calculate the element-wise skewness of all seen arrays.

std(self)

Calculate the element-wise standard deviation of all seen arrays.

var(self)

Calculate the element-wise variance of all seen arrays.

Attributes#

NpOnlineStats.dtype#

The dtype of the accumulator.

NpOnlineStats.nacc#

The number of samples accumulated.

NpOnlineStats.ndim#

The number of dimensions of the accumulator.

NpOnlineStats.shape#

The shape of the accumulator.

NpOnlineStats.size#

The size of the accumulator.

NpOnlineStats.strides#

The strides of the accumulator.

Methods#

NpOnlineStats.add(self, arr: ndarray[device='cpu']) None#

Add an array to the accumulator.

Args:

arr: An array.

NpOnlineStats.cdf(self, x: float) numpy.ndarray[dtype=float64]#
NpOnlineStats.cdf(self, arg: float, /) numpy.ndarray[dtype=float64]
NpOnlineStats.cdf(self, arg: int, /) numpy.ndarray[dtype=float64]
NpOnlineStats.cdf(self, arg: int, /) numpy.ndarray[dtype=float64]
NpOnlineStats.cdf(self, arg: int, /) numpy.ndarray[dtype=float64]
NpOnlineStats.cdf(self, arg: int, /) numpy.ndarray[dtype=float64]
NpOnlineStats.cdf(self, arg: int, /) numpy.ndarray[dtype=float64]
NpOnlineStats.cdf(self, arg: int, /) numpy.ndarray[dtype=float64]
NpOnlineStats.cdf(self, arg: int, /) numpy.ndarray[dtype=float64]
NpOnlineStats.cdf(self, arg: int, /) numpy.ndarray[dtype=float64]

Overloaded function.

  1. cdf(self, x: float) -> numpy.ndarray[dtype=float64]

Calculate the element-wise approximate cumulative distribution function.

Args:

x: Value at which the CDF is to be calculated.

Returns: A Numpy array.

  1. cdf(self, arg: float, /) -> numpy.ndarray[dtype=float64]

  2. cdf(self, arg: int, /) -> numpy.ndarray[dtype=float64]

  3. cdf(self, arg: int, /) -> numpy.ndarray[dtype=float64]

  4. cdf(self, arg: int, /) -> numpy.ndarray[dtype=float64]

  5. cdf(self, arg: int, /) -> numpy.ndarray[dtype=float64]

  6. cdf(self, arg: int, /) -> numpy.ndarray[dtype=float64]

  7. cdf(self, arg: int, /) -> numpy.ndarray[dtype=float64]

  8. cdf(self, arg: int, /) -> numpy.ndarray[dtype=float64]

  9. cdf(self, arg: int, /) -> numpy.ndarray[dtype=float64]

NpOnlineStats.kurtosis(self) numpy.ndarray[dtype=float64]#

Calculate the element-wise kurtosis of all seen arrays.

Returns: A Numpy array.

NpOnlineStats.max(self) numpy.ndarray[dtype=float64]#

Get the element-wise maximum of all seen arrays.

Returns:

A Numpy array.

NpOnlineStats.mean(self) numpy.ndarray[dtype=float64]#

Calculate the element-wise mean of all seen arrays.

Returns: A Numpy array.

NpOnlineStats.min(self) numpy.ndarray[dtype=float64]#

Get the element-wise minimum of all seen arrays.

Returns:

A Numpy array.

NpOnlineStats.quantile(self, q: float) numpy.ndarray[dtype=float64]#

Calculate an approximate quantile based on the current state.

Args:

q: A quantile. Must be between 0 and 1.

Returns:

A Numpy array.

NpOnlineStats.reset(self) None#

Reset the accumulator. Dtype and shape are kept, only the statistics are reset.

NpOnlineStats.skewness(self) numpy.ndarray[dtype=float64]#

Calculate the element-wise skewness of all seen arrays.

Returns: A Numpy array.

NpOnlineStats.std(self) numpy.ndarray[dtype=float64]#

Calculate the element-wise standard deviation of all seen arrays.

Returns: A Numpy array.

NpOnlineStats.var(self) numpy.ndarray[dtype=float64]#

Calculate the element-wise variance of all seen arrays.

Returns: A Numpy array.