Weighted standard deviation in NumPy

numpy standard deviation
numpy standard error
python standard deviation
numpy average
sample standard deviation weight
standard deviation of an image python
numpy rms
standard deviation formula

numpy.average() has a weights option, but numpy.std() does not. Does anyone have suggestions for a workaround?

How about the following short "manual calculation"?

def weighted_avg_and_std(values, weights):
    """
    Return the weighted average and standard deviation.

    values, weights -- Numpy ndarrays with the same shape.
    """
    average = numpy.average(values, weights=weights)
    # Fast and numerically precise:
    variance = numpy.average((values-average)**2, weights=weights)
    return (average, math.sqrt(variance))

std - Numpy and Scipy, def weighted_avg_and_std(values, weights): """ Return the weighted average and standard deviation. values, weights -- Numpy ndarrays with  Recommend: numpy - Standard Deviation of a percentage change in Python. deviation of STD (X), the second set of data also has the mean value of m (Y) and standard deviation of STD (Y). I want to find out the the percentage change of data set 2 compared to data set 1 (i.e., change in averages over the old average m.

Weighted standard deviation in NumPy, This is essentially the same as replicating each observations by its weight, if the However, statistical tests are independent of ddof , based on the standard formulas. Examples. >>> import numpy as np >>> np.random.seed(0) >>> x1_2d = 1.0 + np.random.randn(20, 3) >>> w1 standard deviation of weighted mean. sum. Previous: Write a NumPy program to compute the median of flattened given array. Next: Write a NumPy program to compute the mean, standard deviation, and variance of a given array along the second axis.

Here's one more option:

np.sqrt(np.cov(values, aweights=weights))

statsmodels.stats.weightstats.DescrStatsW, def weighted_avg_and_std(values, weights): """ Return the weighted average and standard deviation. values, weights -- Numpy ndarrays with  Numpy is a popular Python library for data science focusing on arrays, vectors, and matrices. This puzzle introduces the average function from the NumPy library. When applied to a 1D NumPy array, this function returns the average of the array values. When applied to a 2D NumPy array, it simply flattens the array.

There doesn't appear to be such a function in numpy/scipy yet, but there is a ticket proposing this added functionality. Included there you will find Statistics.py which implements weighted standard deviations.

Weighted standard deviation in NumPy?, Weighted standard deviation in NumPy. numpy.average() has a weights option, but numpy.std() does not. Does anyone have suggestions for a workaround? To compute the standard deviation, we use the numpy module. The standard deviation, many times represented by σ or s, is a measure of how spread out numbers are. It is measure that is used to quantify the amount of variation or dispersion there is in a data set.

There is a very good example proposed by gaborous:

import pandas as pd
import numpy as np
# X is the dataset, as a Pandas' DataFrame
mean = mean = np.ma.average(X, axis=0, weights=weights) # Computing the 
weighted sample mean (fast, efficient and precise)

# Convert to a Pandas' Series (it's just aesthetic and more 
# ergonomic; no difference in computed values)
mean = pd.Series(mean, index=list(X.keys())) 
xm = X-mean # xm = X diff to mean
xm = xm.fillna(0) # fill NaN with 0 (because anyway a variance of 0 is 
just void, but at least it keeps the other covariance's values computed 
correctly))
sigma2 = 1./(w.sum()-1) * xm.mul(w, axis=0).T.dot(xm); # Compute the 
unbiased weighted sample covariance

Correct equation for weighted unbiased sample covariance, URL (version: 2016-06-28)

Weighted standard deviation in NumPy, def weighted_avg_and_std(values, weights): """ Return the weighted average and standard deviation. values, weights -- Numpy ndarrays with the same shape. NumPy Statistics: Exercise-7 with Solution. Write a NumPy program to compute the mean, standard deviation, and variance of a given array along the second axis. From Wikipedia: There are several kinds of means in various branches of mathematics (especially statistics).

python Weighted standard deviation in NumPy?, Here are obvious places to look: > > numpy > scipy.stats the weighted mean, error, and optionally standard deviation of an input array. Weighted average is an average resulting from the multiplication of each component by a factor reflecting its importance. The numpy.average () function computes the weighted average of elements in an array according to their respective weight given in another array. The function can have an axis parameter.

[Numpy-discussion] weighted mean; weighted standard error of the , In the case of a discrete probability distribution of a random variable X, the mean is equal to the sum over every possible value weighted by the  NumPy version of “Exponential weighted moving average”, equivalent to pandas.ewm().mean()

NumPy: Compute the mean, standard deviation, and variance of a , Calculate the standard deviation of these values. axis : None or int or tuple of ints, optional. Axis or axes along which the standard deviation is  The average squared deviation is normally calculated as x.sum () / N, where N = len (x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for

Comments
  • Btw, calculation of weighted std dev is actually a rather complex subject -- there's more than one way to do it. See here for a great discussion: stata.com/support/faqs/statistics/…
  • Why not use numpy.average again for the variance?
  • Just wanted to point out that this will give the biased variance. For small sample sizes, you may want to re-scale the variance (before sqrt) to get the unbiased variance. See en.wikipedia.org/wiki/…
  • Yeah, the unbiased variance estimator would be slightly different. This answer gives the standard deviation, since the question asks for a weighted version of numpy.std().
  • thx for this solution... but why do you use math.sqrt instead of np.sqrt in the end?
  • np.sqrt() would work, but because variance is a simple (Numpy) float (and not a NumPy array), math.sqrt() is more explicit and appropriate (and therefore in general faster, if this matters).
  • To use this approach to easily calculate the weighted coefficient of variation, see this answer.