Spotify Logo

How gender might be impacting your music choices

by Joseph Davies

Introduction

Music is something very important to just about everyone. In popular culture, it's common for people to make judgments about your personality based on your music taste. As a (non-scientific) example, this can easily be seen on comment sections of catchy songs that have become popular, where guys will praise the song, but imply that it would be too embarrassing to share with their guy friends, since that would ruin their manly image:

This is that one song you listen to alone without the boys

Don't tell the homies im listening to this

This tendency is really interesting to me, as my music taste tends to prefer female vocalists. Is my taste really that exceptional among men, or could this in part be based more on how we present ourselves to the world? That is to say, maybe men do listen to "feminine" music more often than they might let on, but that isn't something they present to the world.

Some questions that might naturally lead from this could include questions like:

With services like Spotify having publicly available APIs, a lot of data on listening patterns have been collected from users' listening histories. So, we can take a look at this data to attempt to at least partially answer these questions.

Data Collection

The information in this project is scraped from Every Noise at Once, which compiles data from Spotify and does a lot of cool analysis with it. For example, it can compare genres' similarity, acousticness, Christmassy-ness, modernness, youthfulness, engagement, etc. It's a very cool tool.

(Side note: I was originally going to make this a project comparing data from here and a site called rateyourmusic, but rateyourmusic IP banned me for attempting to web scrape!)

Specifically, we'll be scraping from their page on gender-based listening patterns by genre.

First, we'll use requests to get the webpage we want, then we will use BeautifulSoup to get the table we want from it, then feed that to the panda we stole from the zoo:

Tidying up

Looking good so far, but the column names are in the first row, the first column already has its job done by our indices, and the last row is not data, so let's clean all that stuff up:

What the columns mean

According to Every Noise, the columns are defined as follows (these descriptions are also from that page):

When references are made to female streams or female artists from here on out, this includes nonbinary artists, even though that is a different group of people altogether - this is just how the data is organized. However, the groups of listeners are only categorized into male and female.

With this information we have a lot of different factors we can assess by. Let's start!

We mainly want to focus on the most popular genres, so we're going to only look at the 300 most listened to genres for data. Because our data is already sorted by popularity, we can just take the first 300 rows.

This data is all currently stored as strings since we got it from a web scrape, so let's convert it into numbers to analyze:

Before we go any further, we should check for missing data and see what we need to do to account for it from there:

Extremely fortunately for us, we do not have any missing data.

Exploratory data analysis and Machine Learning

The first thing we might want to look at is the rate of female listeners to the rate of female streams to see if there is a correlation between gender and listening preferences.

First thoughts

Here we have our first visualization of some data. As we can see, there does seem to be a correlation between the proportion of female listeners and female artists. This would suggest that, the higher the proportion of women or non-binary artists there are in a given genre, the higher the proportion of listeners that are also women.

If we want to make better predictions for this data though, it doesn't seem like a linear regression would work, since the data seems to bunch up a lot at the left up to a certain point, where there are genres that have near-zero female/nonbinary artists but a fair amount of female listeners. One example of this in our dataset is with the genre "rap," which has a 3% share of female/nonbinary artists with a 27.6% share of female listeners.

Indeed, the data seems to curve, so a nonlinear regression might give us better insight. For this part, I found this guide extremely helpful, and the code for this block has heavily borrowed (and adjusted) from it.

Also, we should evaluate our R-squared score to check how well we have fit the data:

That looks very good, so we can assume that we have made an appropriate fit with our data.

Just for fun, let's look at where the top 5 most listened-to genres exist on this chart. We can annotate them on the graph to find out (while not plotting the rest of the points, so we can get a clearer look)

Another set of comparisons we could look at is how the proportion of female streams compares with the 'ffshare' and 'mfshare' variables we have. As defined earlier, 'ffshare' refers to the share of streams women listening to a genre listen to from women and nonbinary artists, and likewise, 'mfshare' refers to the share of streams men listening to a genre listen to from women and nonbinary artists.

Like before, we'll also plot where the top 5 most listened to genres factor into this.

This could also suggest whether men listen to men in a given genre more than women do, which ties back to what we wanted to figure out from the start.

For this data, we'll fit linear regressions.

These graphs look fairly similar, and seem to say that, generally, neither men nor women deviate that much from the average listener. However, this is not the case for every genre. For example, one of the most listened to genres, pop rap, has a more than 20% share difference between the two groups.

Overall, though, to get a better idea of these patterns, what if we looked at the difference between men and women's listening patterns? Thankfully, we already have the "shareskew" variable to do this.

As a kind of control, we should also look at differences men and women have with the overall average, as it will likely tell us a good amount more than the previous two graphs we made. Positive values here would mean that that group listens to women/non-binary artists more than average (and negative values the opposite.)

Sadly, as mentioned before, it seems that nonbinary listeners don't have their listening habits documented in this dataset, so we can't look at their data. :(

Now, let's plot our information to visualize how men and women measure up to the average listener, and to each other! We'll also see how this looks for the 5 most popular genres.

Looking at these graphs, we can observe an interesting relationship: the higher the share of streams by a female artist there are, the more that male listeners skew toward listening to male artists than the average. For female listeners, the opposite pattern seems to hold, but to a much weaker degree, as the linear regression we generated has a much flatter incline.

The last graph, which shows the difference between genders, subtracts the male share from the female share, so this means that a positive value indicates that women's streams of a genre are more female-saturated than men's streams of the same genre. The general pattern seems to follow from the past observed patterns: that men and women's listening patterns grow more and more disparate the more female-saturated a genre becomes. Also from the past observations, we could claim that the difference is more driven by male listeners than female listeners, since their listening patterns become more and more male-skewed at a faster rate than women's listening patterns become more and more female-skewed.

Conclusion

From all of this, we can see that there are differences between men and women's listening patterns, and that those differences grow the more female-saturated a genre is. Similarly, when a genre has more women listening to it, it also tends to have a higher share of its streams be from women and nonbinary artists. However, these tendencies are a bit weaker than I initially suspected they would be.

Going back to the questions we asked at the start:

  1. Do men tend to listen less to music by women/non-binary artists (and vice versa)?

Generally, yes. As we saw, the more female-saturated a genre was, the more female-saturated its listener base was.

  1. Do men tend to gravitate towards male artists when genres are female-dominated?

Also yes--but not to a huge degree. As genres became more and more female-saturated, male and female listeners tended to listen to more artists from their own gender, approaching at most a difference in 5% of streams in both cases (though female listeners listen to about the same amount more from female artists across the board, only very slightly climbing, while male listeners' tendencies vary more based on the female-saturation in the genre.)

If you would like to check out the Every Noise at Once database and maybe make some analyses of your own, you can check it out and navigate through it here.

In the process of working on this, I listened to the following albums: