How to do Pivot in Polars

17 May, 2023

Contributors

Happiness Omale

@omalehappiness1327

Introduction

In data manipulation and analysis, pivoting is a crucial technique that allows us to transform, reshape, and gain new insights from our datasets. Whether you're a data scientist, analyst, or business professional, understanding how to perform effective pivots is an invaluable skill that can significantly enhance your data exploration capabilities. Among the many available tools, polars emerges as a powerful open-source library that simplifies and accelerates the pivot process.

In this article, we will explore the art of pivoting data using polars, a blazingly fast dataframe library built on rust, while providing practical code examples to illustrate the process. Whether you're new to polars or an experienced user looking to expand your skills, this guide will equip you with the knowledge to pivot your dataset confidently.

To begin, make sure you have polars installed. You can install it using pip:

pip install polars

Once polars is installed, let's start by importing the necessary modules:

import polars as pl

For this example, let's consider creating a dataframe containing information about sales transactions. The dataframe has three columns: "Product", "Region", and "Sales". Here's an example:

Product,Region,Sales
A,North,100
A,North,200
B,South,400
B,South,300
C,West,500
C,West,600

We want to pivot the data based on the "Region" column and convert it into a columnar format, where each unique region becomes a separate column. The sales values are placed under their respective regions.
Here's how you can do the pivot operation using polars:

# Create the DataFrame
df = pl.DataFrame({
    "Product": ["A", "A", "B", "B" ,"C", "C"],
    "Region": ["North", "North", "South", "East", "West", "West"],
    "Sales": [100, 200, 400, 300, 500, 600]
})

# Perform the pivot operation
pivot_df = df.pivot(index="Product", values="Sales", columns = "Region", aggregate_function="first")
pivot_df

In the code above, we first create a dataFrame df using the pl.DataFrame constructor, passing a dictionary with the column names as keys and lists of values as values.

Then, we use the pivot method on the DataFrame, specifying the "Product" column as the index and the "Sales" column as the values. The pivot method returns a new DataFrame pivot_df with the pivot operation applied.

The output will be:

As you can see, the resulting DataFrame pivot_df has four columns: "Product", "North", "South", "East" and "West". The values are populated under the respective columns based on the "Sales" values.

It's important to note that if there are duplicate values for a specific combination of index and column, the pivot operation will aggregate them using the default aggregation function.

Setting the aggregate function to "first" means selecting the first value within a group when performing a group aggregation operation. Setting the aggregate function to "last" means selecting the last value within a group when performing a group aggregation operation

Try setting the aggregate value to last and see the output:

# Perform the pivot operation
pivot_df = df.pivot(index="Product", values="Sales", columns = "Region", aggregate_function="last")
pivot_df

Output:

The aggregate function can be set to many things such as "mean", "max", "min", "median" etc. Depending on the specific need.

Let"s try to see the output, if the aggregate function is set to "mean"

# Perform the pivot operation
pivot_df = df.pivot(index="Product", values="Sales", columns = "Region", aggregate_function=pl.element().tanh().mean())
pivot_df

In the above code, we use the 'tanh()' function because it helps to produce a zero-centered output, i.e., it helps to center the data by bringing the mean close to zero.

Output:

We can also perform melt in polars, lets look an example.

Melt in Polars

The 'melt' function in polars is a function that changes a dataframe into a format in which one or more columns are identifier variables, while all other columns are considered measured variables, which are “unpivoted” to the row axis, leaving just two non-identifier columns.

Here's an example:

df = pl.DataFrame(
    {
        "a": ["x", "y", "z"],
        "b": [1, 2, 3],
        "c": [4, 5, 6],
    }
)
df.melt(id_vars="a", value_vars=["b", "c"])

Output:

Let's try changing the identifier variables:

df = pl.DataFrame(
    {
        "a": ["x", "y", "z"],
        "b": [1, 2, 3],
        "c": [4, 5, 6],
    }
)
df.melt(id_vars="b", value_vars=["a", "c"])

Output:

df = pl.DataFrame(
    {
        "a": ["x", "y", "z"],
        "b": [1, 2, 3],
        "c": [4, 5, 6],
    }
)
df.melt(id_vars="c", value_vars=["a", "b"])

Output:

FAQs

What is a pivot table in polars?

A pivot table in polars is a way of summarizing data by grouping and aggregating it. The table is arranged in a way that makes it easy to compare data across different categories.

How do I create a pivot table in polars?

You can create a pivot table in polars using the pivot() function. This function takes in the columns you want to group your data by, the columns you want to aggregate, and the aggregation function you want to use.

Can I pivot on multiple columns in polars?

Yes, you can pivot on multiple columns in polars. Simply provide a list of columns to group by when calling the pivot() function.

What aggregation functions can I use when pivoting in polars?

Polars supports a variety of aggregation functions when pivoting, including sum(), mean(), count(), first(), last(), min(), and max().

How do I deal with missing values when pivoting in polars?

By default, polars will fill in missing values with None when pivoting. However, you can specify a different value to use for missing values by passing the fill_value argument to the pivot() function.

Conclusion

Mastering the pivot operation in polars opens up a world of possibilities for efficient and flexible data manipulation. By understanding the fundamental principles and utilizing the powerful tools provided by polars, analysts and data scientists can effortlessly transform and reshape their datasets, gaining valuable insights and driving informed decision-making. Whether aggregating data, creating pivot tables, or performing advanced calculations, polars empowers users to navigate complex data structures easily. With its intuitive syntax, extensive functionality, and seamless integration with other data processing libraries, polars is undoubtedly a valuable asset in the modern data analysis toolkit. By leveraging the pivot operation in polars, users can unlock the full potential of their data and accelerate their analytical workflows, leading to more accurate and impactful results.

Thank you for reading!

How to do Pivot in Polars

Introduction

Melt in Polars

FAQs

Conclusion

More Articles