4.13. Series Statistics

4.13.1. SetUp

>>> import pandas as pd
>>> import numpy as np
>>>
>>>
>>> s = pd.Series(
...     data = [1.0, 2.0, 3.0, np.nan, 5.0],
...     index = ['a', 'b', 'c', 'd', 'e'])
>>>
>>> s
a    1.0
b    2.0
c    3.0
d    NaN
e    5.0
dtype: float64

4.13.2. Count

  • Series.count() - Number of non-null observations

  • Series.nunique() - Number of unique values

  • Series.value_counts() - Frequency of unique values

  • Series.size - Number of elements

  • len(Series) - Number of elements

>>> len(s)
5
>>> s.size
5
>>> s.count()
4
>>> s.nunique()
4
>>> s.value_counts()
1.0    1
2.0    1
3.0    1
5.0    1
Name: count, dtype: int64

4.13.3. Sum

  • Series.sum() - Sum of values

  • Series.cumsum() - Cumulative sum

>>> s.sum()
11.0
>>> s.cumsum()
a     1.0
b     3.0
c     6.0
d     NaN
e    11.0
dtype: float64

4.13.4. Product

  • Series.prod() - Product of values

  • Series.cumprod() - Cumulative product

>>> s.prod()
30.0
>>> s.cumprod()
a     1.0
b     2.0
c     6.0
d     NaN
e    30.0
dtype: float64

4.13.5. Extremes

  • Series.min() - Minimum value

  • Series.idxmin() - Index of minimum value (Float, Int, Object, Datetime, Index)

  • Series.argmin() - Range index of minimum value

  • Series.cummin() - Cumulative minimum

  • Series.max() - Maximum value

  • Series.idxmax() - Index of maximum value (Float, Int, Object, Datetime, Index)

  • Series.argmax() - Range index of maximum value

  • Series.cummax() - Cumulative maximum

Minimum, index of minimum and cumulative minimum:

>>> s.min()
1.0
>>> s.idxmin()
'a'
>>> s.argmin()
0
>>> s.cummin()
a    1.0
b    1.0
c    1.0
d    NaN
e    1.0
dtype: float64

Maximum, index of maximum and cumulative maximum:

>>> s.max()
5.0
>>> s.idxmax()
'e'
>>> s.argmax()
4
>>> s.cummax()
a    1.0
b    2.0
c    3.0
d    NaN
e    5.0
dtype: float64

4.13.6. Average

  • Series.mean() - Arithmetic mean of values

  • Series.median() - Median of values

  • Series.mode() - Mode of values

  • Series.rolling(window=2).mean() - Rolling average

Arithmetic mean of values:

>>> s.mean()
2.75

Arithmetic median of values:

>>> s.median()
2.5

Mode:

>>> s.mode()
0    1.0
1    2.0
2    3.0
3    5.0
dtype: float64

Rolling Average:

>>> s.rolling(window=2).mean()
a    NaN
b    1.5
c    2.5
d    NaN
e    NaN
dtype: float64
../../_images/pandas-series-stats-rolling.png

Figure 4.16. Rolling Average

4.13.7. Distribution

  • Series.abs() - Absolute value

  • Series.std() - Standard deviation

  • Series.sem() - Standard Error of the Mean (SEM)

  • Series.skew() - Skewness (3rd moment)

  • Series.kurt() - Kurtosis (4th moment)

  • Series.quantile() - Sample quantile (value at %)

  • Series.var() - Variance

  • Series.corr() - Correlation Coefficient

Absolute value:

>>> s.abs()
a    1.0
b    2.0
c    3.0
d    NaN
e    5.0
dtype: float64

Standard deviation:

>>> s.std()
1.707825127659933
../../_images/pandas-series-stats-stdev.png

Figure 4.17. Standard Deviation

Standard Error of the Mean (SEM):

>>> s.sem()
0.8539125638299665
../../_images/pandas-series-stats-sem.png

Figure 4.18. Standard Error of the Mean (SEM)

Skewness (3rd moment):

>>> s.skew()
0.7528371991317256
../../_images/pandas-series-stats-skew.png

Figure 4.19. Skewness

Kurtosis (4th moment):

>>> s.kurt()
0.3428571428571434
../../_images/pandas-series-stats-kurt.png

Figure 4.20. Kurtosis

Sample quantile (value at %). Quantile also known as Percentile:

>>> s.quantile(.3)
1.9
>>> s.quantile([.25, .5, .75])
0.25    1.75
0.50    2.50
0.75    3.50
dtype: float64

Variance:

>>> s.var()
2.9166666666666665

Correlation Coefficient:

>>> s.corr(s)
1.0
../../_images/pandas-series-stats-corr.png

Figure 4.21. Correlation Coefficient

4.13.8. Describe

  • Series.describe() - Summary statistics

>>> s.describe()
count    4.000000
mean     2.750000
std      1.707825
min      1.000000
25%      1.750000
50%      2.500000
75%      3.500000
max      5.000000
dtype: float64