The Quickest Ways to Sort Pandas DataFrame Values

Pandas is a popular open source library used for data analysis. It’s written in Python, and it has a most ambitious mission. Per its own description in GitHub, Pandas “aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis/manipulation tool available in any language.”

An important component in Pandas is the DataFrame—the most commonly used Pandas object.

Typically, data science practitioners often need to perform various data engineering operations, such as aggregation, sorting, and filtering data. This article aims to help the typical data science practitioner perform sorting values in the Pandas DataFrame.

I’ll discuss seven common use cases, and I’ll dive deeper into the parameters involved with the two major sorting functions. That way, if you need to perform sorting outside the seven common use cases, you’ll be better prepared to adapt.

The two major sort functions

Pandas has two key sort functions: sort_values and sort_index. You can check the API for sort_values and sort_index at the Pandas documentation for details on the parameters.

  • sort_values(): You use this to sort the Pandas DataFrame by one or more columns.
  • sort_index(): You use this to sort the Pandas DataFrame by the row index.

Sorting by the values of the selected columns

Allow me to explain the differences between the two sorting functions more clearly. sort_values is easier to understand. Imagine you have a DataFrame that looks like this:

Date/Time                | Weather
------------------|------------
2012-01-01 00:00:00  | Fog
2012-01-01 01:00:00  | Fog
2012-01-01 02:00:00  | Freezing Drizzle,Fog

When you want to sort the DataFrame by the column Weather, you’d use sort_values. Similarly, when you want to sort the DataFrame by the values of one or more columns, you’d also use sort_values.

Sorting by the labels of the DataFrame

By contrast, sort_index doesn’t indicate its meaning as obviously from its name alone. The key thing to know is that the Pandas DataFrame lets you indicate which column acts as the row index. And if you didn’t indicate a specific column to be the row index, Pandas will create a zero-based row index by default.

Therefore, when you execute sort_index, you’re sorting the DataFrame by its row index.

7 common use cases for sorting

Now that I’ve introduced the two main sorting functions, I’ll go into the seven common use cases for sorting your Pandas DataFrame. To simplify, all of the use cases given here will be demonstrated with an open dataset. I prepared the dataset using the following code:

import pandas as pd

# repo for data https://github.com/jvns/pandas-cookbook/tree/master/data
data_url = 'https://raw.githubusercontent.com/jvns/pandas-cookbook/master/data/weather_2012.csv'

# read data from url as pandas dataframe
weather = pd.read_csv(data_url)

# print the first three rows
print(weather.head(n=3))
You should expect the following result:
  Date/Time           ... Weather
0 2012-01-01 00:00:00 ... Fog
1 2012-01-01 01:00:00 ... Fog
2 2012-01-01 02:00:00 ... Freezing Drizzle,Fog

[3 rows x 8 columns]

You should expect the following result:

Notice that the 0, 1, and 2 columns are the row index I covered earlier. If you open the raw data (which is actually a CSV file) directly, you’ll realize that the first column is Date/Time. Recall when I explained how sort_index sorts your DataFrame by the row index? In the result set you see above, the row index is automatically generated and is shown as such.

Now, let’s get started with our first common use case.

Use case #1: Sort by one column’s values

The most common use case is to sort by a single column’s values in ascending order. So, let’s do just that with the prepared dataset. Assume we want to sort the test data by the Weather column in ascending order:

sort_by_weather = weather.sort_values('Weather')
print(sort_by_weather.head(n=3))

We can expect the following result:

     Date/Time           Temp (C)  ... Stn Press (kPa) Weather
7103 2012-10-22 23:00:00 6.4       ... 101.34          Clear
4203 2012-06-24 03:00:00 14.9      ... 101.02          Clear
4204 2012-06-24 04:00:00 14.4      ... 101.04          Clear

[3 rows x 8 columns]

Note that I didn’t explicitly state that the data is sorted in ascending order; the order is set as ascending by default. So, that makes the code easier to write for this use case. If you want to sort by a single column in descending order, all you need is to make the sort order explicit—which brings us to the next use case. 

Use case #2: Sort by one column’s values in descending order

The trick to sorting in descending order is to declare False for the ascending parameter in the sort_values function:

sort_by_weather_desc = weather.sort_values('Weather',ascending=False)
print(sort_by_weather_desc.head(n=3))

You can then expect the following result:

     Date/Time           ... Weather
4757 2012-07-17 05:00:00 ... Thunderstorms,Rain,Fog
4761 2012-07-17 09:00:00 ... Thunderstorms,Rain Showers,Fog
4323 2012-06-29 03:00:00 ... Thunderstorms,Rain Showers,Fog

[3 rows x 8 columns]

Now you’ve learned how to sort a DataFrame by a single column in ascending and descending order. The next thing to learn is how to sort a DataFrame by multiple columns.

Use case #3: Sort by multiple column values

If you recall, in the last two use cases, I simply stated the single column as a single string. If you want to sort by multiple columns, you need to state the columns as a list of strings:

sort_by_weather_temp = weather.sort_values(['Weather','Temp (C)'])
print(sort_by_weather_temp.head(n=3))

And you’ll get this result:

    Date/Time              Temp (C) ... Stn Press (kPa) Weather
344 2012-01-15 08:00:00 -23.3    ... 102.45          Clear
363 2012-01-16 03:00:00 -19.2    ... 103.07          Clear
365 2012-01-16 05:00:00 -19.1    ... 103.02          Clear

[3 rows x 8 columns]

Compare this result against the result from use case #1. The obvious difference is that the lowest temperature within the Clear weather is now at the top of the result set. One thing to appreciate about sorting by multiple columns is that there is precedence when it comes to sorting. In this case, I want to sort the DataFrame by weather first and temperature second. Hence, the list starts with Weather, followed by Temp. This is an intuitive way to write the list of columns you want to sort the DataFrame by.

Now that you know how to sort multiple columns and how to decide the precedence of the columns for sorting, you need to learn how to decide a different sorting order for the different columns.

Use case #4: Sort by multiple column values with a different sort order

Recall that the key point in the last use case was the use of a list to indicate the columns to sort our DataFrame by. Similarly, if we want to pick a different sort order for multiple columns, we would also use a list to indicate the different sort orders.

Back to Top