Spotify API analysis

10 September, 2020

Contributors

Viola

@violayeyun

Since Spotify was established in 2008, selling premium subscriptions to users and advertising placements to third parties have been the two main sources of revenue. Although the revenue has been growing every year, the company has recorded operating loss every year until 2019. The main driver of such loss is the high cost of licensing payments to artists and label companies. To cope with the dilemma, the company has been gradually reducing the weight of its licensing expenses in total expenses. In today’s fierce competition against powerful players like Apple, Amazon, or Google, reducing the weight of licensing expenses may put the company’s sustainability in jeopardy in the long run. To help Spotify grow the revenue without undermining its sustainability, we as consultants aim to produce business values that Spotify’s R&D department and marketing department can utilize to improve their licensing models and marketing models. In this project, we will try to utilize Spotify’s data to identify popular songs or podcasts, and develop analytical models that would enable evaluating the true value of each song, genre, or artist. Such reporting capability would be helpful to improve their licensing strategy as well as marketing strategy, therefore minimize the licensing expenses or maximize the sales revenues.

By the nature of its business, Spotify’s largest expense goes to the licensing payments to artists and label companies. While Spotify has been trying to reduce the weight of licensing expenses year over year (from 86.42% of revenue in 2016 to 74.54% in 2019), it can’t continue to reduce it further because it may cause poorer service quality in the long run. As Spotify is expected to turn a profit for the first time in 2020, Spotify may have more capital resources to identify new business strategies to continue growing without reducing the weight of licensing expenses. Looking at Spotify’s financials, the company has been allocating its capital resources to the Research & Development department and Sales & Marketing department at a constant level. Based on our assessment of the company’s financials, we as consultants are convinced to help the Research & Development department and Sales & Marketing department produce better business results that will be ultimately contribute to generating more revenue while maintaining the licensing expenses at an optimal level.

Exploratory Data Analysis

Our cleaned dataset, “spotify_data”, consists of 10 years worth of 30K+ observations with 23 variables of audio features. The detailed descriptions of the variables are in the following link;

https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/

These audio features are the key resource of our exploratory data analysis. While some of variables are stand-alone (e.g. duration_minutes), many of these variables are comparable as they describe the characteristics of songs in a zero-to-one scale (e.g. danceability vs. acousticness).

Our last exploratory data analysis involves understanding the overall trend in genre popularity over time. This analysis is important because we would want to understand relationships between song characteristics at a more granular level.

Model Analysis

Followed by the Exploratory Data Analysis phase, we have conducted several model analyses that leverage the relationships between song popularity and song variables to build predictive models and recommendation systems.

Regression Analysis

As our goal is to help Spotify reasonably rebalance the licensing expenses by genre, it is important to understand the changing popularity of the genres over time. Our regression analysis aims to predict track popularity (dependent variable) based on its correlations with song characteristics (independent variables) in a timely manner so the company can quickly catch listeners’ preferences and develop new pricing strategies and marketing strategies.

Our first approach is to measure the effect of song characteristics variables on the track popularity by genre. This approach enables us to understand the song characteristic variables at a profound level and would provide more practical suggestions for us to collect songs with specific audio features. The results of this regression analysis are as follow;

Dynamic modeling

2) K-means Clustering

K-means clustering helps us group all observations into groups of the most similarity in terms of the variables in the dataset. The clusters then can be used when developing further predictive models to improve the accuracy of the models.

Cluster 1: Highest danceability, Highest valence

Cluster 2: Highest energy, Lowest acousticness

Cluster 3: Lowest instrumentalness, Highest speechiness

Cluster 4: Lowest speechiness, Lowest liveness

Cluster 5: Lowest acousticness, Highest liveness

Cluster 6: Highest instrumentalness

Cluster 7: Highest acousticness, Lowest energy, Lowest valence

Solutions to the Problem

Reiterating Spotify’s problem, the company needs to find new ways to grow its revenue while keeping its licensing expenses at an optimal level. Solutions to the problem can be developed by constantly analyzing new data and identifying popular tracks and songs to invest. Our deliverables, called Spotify Workbook and Shiny App, can provide the analytical tools and reporting capabilities that our client (Spotify’s Research & Development department and Sales & Marketing department) can utilize regularly whenever new data arrives. While these deliverables would need further improvement, they are capable of enabling the clients to utilize current data to develop new pricing strategies or marketing strategies.

data analysis

dynamic

social media

spotify

Spotify API analysis

More Articles