
How to do Pivot in Polars
17 May, 2023
0
0
0
Contributors
Introduction
In data manipulation and analysis, pivoting is a crucial technique that allows us to transform, reshape, and gain new insights from our datasets. Whether you're a data scientist, analyst, or business professional, understanding how to perform effective pivots is an invaluable skill that can significantly enhance your data exploration capabilities. Among the many available tools, polars emerges as a powerful open-source library that simplifies and accelerates the pivot process.
In this article, we will explore the art of pivoting data using polars, a blazingly fast dataframe library built on rust, while providing practical code examples to illustrate the process. Whether you're new to polars or an experienced user looking to expand your skills, this guide will equip you with the knowledge to pivot your dataset confidently.
To begin, make sure you have polars installed. You can install it using pip:
pip install polars
Once polars is installed, let's start by importing the necessary modules:
import polars as pl
For this example, let's consider creating a dataframe containing information about sales transactions. The dataframe has three columns: "Product", "Region", and "Sales". Here's an example:
Product,Region,Sales
A,North,100
A,North,200
B,South,400
B,South,300
C,West,500
C,West,600
We want to pivot the data based on the "Region" column and convert it into a columnar format, where each unique region becomes a separate column. The sales values are placed under their respective regions.
Here's how you can do the pivot operation using polars:
# Create the DataFrame
df = pl.DataFrame({
"Product": ["A", "A", "B", "B" ,"C", "C"],
"Region": ["North", "North", "South", "East", "West", "West"],
"Sales": [100, 200, 400, 300, 500, 600]
})
# Perform the pivot operation
pivot_df = df.pivot(index="Product", values="Sales", columns = "Region", aggregate_function="first")
pivot_df
In the code above, we first create a dataFrame df
using the pl.DataFrame
constructor, passing a dictionary with the column names as keys and lists of values as values.
Then, we use the pivot
method on the DataFrame, specifying the "Product" column as the index and the "Sales" column as the values. The pivot
method returns a new DataFrame pivot_df
with the pivot operation applied.
The output will be:
As you can see, the resulting DataFrame pivot_df
has four columns: "Product", "North", "South", "East" and "West". The values are populated under the respective columns based on the "Sales" values.
It's important to note that if there are duplicate values for a specific combination of index and column, the pivot operation will aggregate them using the default aggregation function.
Setting the aggregate function to "first" means selecting the first value within a group when performing a group aggregation operation. Setting the aggregate function to "last" means selecting the last value within a group when performing a group aggregation operation
Try setting the aggregate value to last and see the output:
# Perform the pivot operation
pivot_df = df.pivot(index="Product", values="Sales", columns = "Region", aggregate_function="last")
pivot_df
Output:
The aggregate function can be set to many things such as "mean", "max", "min", "median" etc. Depending on the specific need.
Let"s try to see the output, if the aggregate function is set to "mean"
# Perform the pivot operation
pivot_df = df.pivot(index="Product", values="Sales", columns = "Region", aggregate_function=pl.element().tanh().mean())
pivot_df
In the above code, we use the 'tanh()' function because it helps to produce a zero-centered output, i.e., it helps to center the data by bringing the mean close to zero.
Output:
We can also perform melt in polars, lets look an example.
Melt in Polars
The 'melt' function in polars is a function that changes a dataframe into a format in which one or more columns are identifier variables, while all other columns are considered measured variables, which are “unpivoted” to the row axis, leaving just two non-identifier columns.
Here's an example:
df = pl.DataFrame(
{
"a": ["x", "y", "z"],
"b": [1, 2, 3],
"c": [4, 5, 6],
}
)
df.melt(id_vars="a", value_vars=["b", "c"])
Output:
Let's try changing the identifier variables:
df = pl.DataFrame(
{
"a": ["x", "y", "z"],
"b": [1, 2, 3],
"c": [4, 5, 6],
}
)
df.melt(id_vars="b", value_vars=["a", "c"])
Output:
df = pl.DataFrame(
{
"a": ["x", "y", "z"],
"b": [1, 2, 3],
"c": [4, 5, 6],
}
)
df.melt(id_vars="c", value_vars=["a", "b"])
Output:
FAQs
- What is a pivot table in polars?
A pivot table in polars is a way of summarizing data by grouping and aggregating it. The table is arranged in a way that makes it easy to compare data across different categories.
- How do I create a pivot table in polars?
You can create a pivot table in polars using the pivot()
function. This function takes in the columns you want to group your data by, the columns you want to aggregate, and the aggregation function you want to use.
- Can I pivot on multiple columns in polars?
Yes, you can pivot on multiple columns in polars. Simply provide a list of columns to group by when calling the pivot()
function.
- What aggregation functions can I use when pivoting in polars?
Polars supports a variety of aggregation functions when pivoting, including sum()
, mean()
, count()
, first()
, last()
, min()
, and max()
.
- How do I deal with missing values when pivoting in polars?
By default, polars will fill in missing values with None
when pivoting. However, you can specify a different value to use for missing values by passing the fill_value
argument to the pivot()
function.
Conclusion
Mastering the pivot operation in polars opens up a world of possibilities for efficient and flexible data manipulation. By understanding the fundamental principles and utilizing the powerful tools provided by polars, analysts and data scientists can effortlessly transform and reshape their datasets, gaining valuable insights and driving informed decision-making. Whether aggregating data, creating pivot tables, or performing advanced calculations, polars empowers users to navigate complex data structures easily. With its intuitive syntax, extensive functionality, and seamless integration with other data processing libraries, polars is undoubtedly a valuable asset in the modern data analysis toolkit. By leveraging the pivot operation in polars, users can unlock the full potential of their data and accelerate their analytical workflows, leading to more accurate and impactful results.
Thank you for reading!