How to Use Pandas GroupBy, Counts and Value Counts
Table of Contents
If you’re a data scientist, you likely spend a lot of time cleaning and manipulating data for use in your applications. One of the core libraries for preparing data is the Pandas library for Python.
In a previous post, we explored the background of Pandas and the basic usage of a Pandas DataFrame, the core data structure in Pandas. Check out that post if you want to get up to speed with the basics of Pandas.
In this post, we’ll explore a few of the core methods on Pandas DataFrames. These methods help you segment and review your DataFrames during your analysis.
We’ll cover
- Using Pandas
groupby
to segment your DataFrame into groups. - Exploring your Pandas DataFrame with
counts
andvalue_counts
.
Let’s get started.
Pandas groupby
Pandas is typically used for exploring and organizing large volumes of tabular data, like a super-powered Excel spreadsheet. Often, you’ll want to organize a pandas DataFrame into subgroups for further analysis.
For example, perhaps you have stock ticker data in a DataFrame, as we explored in the last post. Your Pandas DataFrame might look as follows:
>>> df
date symbol open high low close volume
0 2019-03-01 AMZN 1655.13 1674.26 1651.00 1671.73 4974877
1 2019-03-04 AMZN 1685.00 1709.43 1674.36 1696.17 6167358
2 2019-03-05 AMZN 1702.95 1707.80 1689.01 1692.43 3681522
3 2019-03-06 AMZN 1695.97 1697.75 1668.28 1668.95 3996001
4 2019-03-07 AMZN 1667.37 1669.75 1620.51 1625.95 4957017
5 2019-03-01 AAPL 174.28 175.15 172.89 174.97 25886167
6 2019-03-04 AAPL 175.69 177.75 173.97 175.85 27436203
7 2019-03-05 AAPL 175.94 176.00 174.54 175.53 19737419
8 2019-03-06 AAPL 174.67 175.49 173.94 174.52 20810384
9 2019-03-07 AAPL 173.87 174.44 172.02 172.50 24796374
10 2019-03-01 GOOG 1124.90 1142.97 1124.75 1140.99 1450316
11 2019-03-04 GOOG 1146.99 1158.28 1130.69 1147.80 1446047
12 2019-03-05 GOOG 1150.06 1169.61 1146.19 1162.03 1443174
13 2019-03-06 GOOG 1162.49 1167.57 1155.49 1157.86 1099289
14 2019-03-07 GOOG 1155.72 1156.76 1134.91 1143.30 1166559
Perhaps we want to analyze this stock information on a symbol-by-symbol basis rather than combining Amazon (“AMZN”) data with Google (“GOOG”) data or that of Apple (“AAPL”).
This is where the Pandas groupby
method is useful. You can use groupby
to chunk up your data into subsets for further analysis.
Basic Pandas groupby
usage
Let’s do some basic usage of groupby
to see how it’s helpful.
In your Python interpreter, enter the following commands:
>>> import pandas as pd
>>> import numpy as np
>>> url = 'https://gist.githubusercontent.com/alexdebrie/b3f40efc3dd7664df5a20f5eee85e854/raw/ee3e6feccba2464cbbc2e185fb17961c53d2a7f5/stocks.csv'
>>> df = pd.read_csv(url)
>>> df
date symbol open high low close volume
0 2019-03-01 AMZN 1655.13 1674.26 1651.00 1671.73 4974877
1 2019-03-04 AMZN 1685.00 1709.43 1674.36 1696.17 6167358
2 2019-03-05 AMZN 1702.95 1707.80 1689.01 1692.43 3681522
3 2019-03-06 AMZN 1695.97 1697.75 1668.28 1668.95 3996001
4 2019-03-07 AMZN 1667.37 1669.75 1620.51 1625.95 4957017
5 2019-03-01 AAPL 174.28 175.15 172.89 174.97 25886167
6 2019-03-04 AAPL 175.69 177.75 173.97 175.85 27436203
7 2019-03-05 AAPL 175.94 176.00 174.54 175.53 19737419
8 2019-03-06 AAPL 174.67 175.49 173.94 174.52 20810384
9 2019-03-07 AAPL 173.87 174.44 172.02 172.50 24796374
10 2019-03-01 GOOG 1124.90 1142.97 1124.75 1140.99 1450316
11 2019-03-04 GOOG 1146.99 1158.28 1130.69 1147.80 1446047
12 2019-03-05 GOOG 1150.06 1169.61 1146.19 1162.03 1443174
13 2019-03-06 GOOG 1162.49 1167.57 1155.49 1157.86 1099289
14 2019-03-07 GOOG 1155.72 1156.76 1134.91 1143.30 1166559
In the steps above, we’re importing the Pandas and NumPy libraries, then setting up a basic DataFrame by downloading CSV data from a URL. We print our DataFrame to the console to see what we have.
Back to Top