February 26, 2019 tedplayer4

How procrastination taught me some interesting stuff about TED Talks

Lately, I did something that I am not very proud of. To be honest, I have done it quite a bit. I try not to, but eventually it happens. I bet you’ve done it too, it’s called procrastination.  Maybe you are doing it right now while reading this blog entry? Well, don’t feel bad, this is a good way of procrastinating, I promise. You will gain some knowledge and maybe fresh up your stats.
So, I was procrastinating and happened to watch TED Talks. Of course, at the end of every talk I listened to the evil inner voice whispering one of the greatest lies ever: ‘Only one more!’.

After an hour or two I started to wonder what the very first TED talks were like. Also, I was wondering how Ted conferences had evolved over time and how the topics covered might have changed. How come some talks are so popular and others aren’t? Does it depend on the topic, the occupation of the speaker or the weekday it got published?

I did my research and found a very useful dataset that is shared under the Creative Common License (just like the TED Talks) and can be downloaded here: https://www.kaggle.com/rounakbanik/ted-talks.
This dataset contains information about 2550 TED Talks, which were published on the TED.com website until September 22nd, 2017. Amazing, let’s embrace procrastination and do some fun statistics!

But before we dig deeper into this newly discovered precious dataset, we will have a look at the number of TED conferences/events held so far to get a feeling how it has evolved and grown.
Luckily you can easily get this information from TED.com.

The first TED conference ever was held in 1984. It was founded by Richard Saul Wurman and his co-founder Harry Marks. Despite an outstanding speaker-lineup, it didn’t go very well, and it took them 6 years to dare to try again. In 1990 the second TED conference was held and boom, from then on, the world got infected with the TED fever with no signs of recovery.

2009 was the year when TEDx, an independently organized event, was born. In this very first year there were already 278 TEDx events and the number rose massively over the following years. On average there were more than 2717 (mean = 2717.3) every year from 2009 to 2018. Wow, that is a lot!

Now, I guess, is time to lay hands on the data set. It contains the date the talks were published and the date they were filmed. 2006 was the year when the first talks were made public, but for now we are more interested in the date they were filmed, since this is the date they were held.

If we look at the plot of the number of talks filmed each year (and later made public) you can clearly see a huge jump in 2009. That’s probably because 2009 is the birth year of TEDx, which raises the number of events and consequently, the number of talks. I realized that TED doesn’t only publish TED or TEDx talks on their website but also talks from different conferences and events that TED considers to be worth sharing too. That’s why there are talks filmed in 1972 and 1983 even though TED hadn’t been invented by then. (I decided to keep them in the analysis since they are less than 4% of all talks and obviously TED likes to be associated with those talks and their topics.)
The huge drop from 2016 to 2017 might partly be because 2017 is not completely covered in the data set, but only until the end of September. Keep in mind that not every talk at every conference or event gets filmed and published. Thus, this dataset is just a snippet of all TED talks happening around the world.

So far so good. We now have a good impression of how the TED fever infected the world and my hopes that I might actually be studying today are gone for good.
The next thing we are going to explore is how the topics of the talks have changed over the years. Each talk published is tagged with topic-related keywords such as ‘music’, ‘entertainment’, ‘education’, ‘dance’, etc. Let’s see what the 10 most assigned themes are.

Looks like Technology is taking the lead there with more than 700 talks out of all 2550 talks being tagged with it. Design and Entertainment made it to the top 10 as well, which is, in fact,

not a big surprise since TED stands for Technology, Entertainment, and Design. Taking those 10 themes and analyzing their trend from 2009 to 2017, we can see some interesting developments:

  1. The proportion of talks tagged with ‘Technology’ and ‘Science’ is more or less stable through the years – it was important then and now.
  2. ‘Entertainment’ and especially ‘Culture’ got less and less popular.
  3. Topics related to society, health and innovation clearly picked up on popularity. More than 40% of all talks filmed in 2016 (and later published) were tagged with the theme ‘Society’

The third point of our findings kind of endorses my perception of developments in the world and society. I have the impression that health and topics like social skills, human interactions, emotional intelligence, psychology, relationships etc. got more awareness and importance during the past couple of years. (Side note: I only took the years from 2009 onwards, since the number of talks of the years before is considerably lower and thus not quite representative; one talk can be tagged with more than one theme; these are only top 10 theme-tags out of 400 different tags; y-axis is percentage of all talks filmed the specific year):

As we restricted this trend analysis to the top 10 themes of all talks and years, we may have missed some tags, that have become popular but did not make it into the overall top 10. To check if this is the case, we can do the trend analysis from another perspective. We look at the top 3 of every single year and see how those have changed over time. The trend is similar, topics like communication, society and humanity have taken over the place of culture, global issues and even technology in the top 3, which supports our findings from above, Yeah!

The Trend of my chances passing the upcoming exam is like the one of culture by the way – rather negative. My bad conscious tells me to stop this TED Talk analysis and start what I actually had planned for today. But my procrastination spirit is strong, and still, an interesting question remains unanswered: What makes a Ted Talk popular? Is there a special recipe or special ingredient that makes a ted talk popular?

I will measure the popularity of a talk by its number of views – the more views the more popular it is. The most popular TED talk at the time this dataset was created (Sept. 2017) was ‘Do schools kill creativity?’ by Ken Robinson with more than 47 Mio. views. (It still is the most popular talk and has reached 55 Mio views by now.)
It is followed by ‘Your body language may shape who you are’ by Amy Cuddy with more than 43 Mio. views. However, only 34 out of 2550 talks have more than 10 Mio views and on average a talk ‘only’ has about 1.69 Mio views (mean = 1698297, median= 1124524). Apparently, there are some talks that have gone viral, probably due to a fortuitous combination of circumstances. Well, everyone wishes for lucky circumstances but usually, this is not the case, as you may know. Consequently, the characteristics of talks gone viral are not representative for the characteristics of ‘normal’ popular TED Talks. In statistics, we call those outliers. For the following analysis, I have removed the outliers to avoid them skewing the results (Side note: a datapoint greater or less than IQR*3 was considered as outlier).

First, let’s have a look if the occupation of the speaker has an influence on the popularity of his or her published talk.

I took the 15 most common occupations (to have a representative number of talks each occupation) of Ted speaker in this dataset and tested if there was a significant difference (a difference that is not only caused by chance) in the popularity of the videos where the speakers had one of those occupations. And guess what, there is. (Side note: significance level α = 0.05).
Talks were the speaker had an occupation like psychologist, philosopher or author tend to have more views on average than occupations like physicist, architect, inventor (I honestly did not know this is a real job). If we think about it, this finding actually makes sense. The topics covered by talks of psychologist, philosophers etc. mainly were:  relationship, identity, faith, humor, choice, love, humanity, communication and so on. These are themes which tend to address almost every human being regardless of possible specific interests like math, chemistry or music, for instance. Consequently, those talks naturally attract a very broad audience.
But does this mean that if you get to do a TED talk and address a topic similar to the ones I listed above, great popularity is guaranteed? Sorry to disappoint you, but the answer is ‘No’! It simply means that within this specific dataset out of the 15 occupations I used for testing, on average psychologists, philosophers, author or writers tend to get more views. (Side note: I did take the time a video has been online into account while testing as it has a positive influence on the views and could skew the results.)

What about the weekday a video is published on? Does it contribute to its amount of views? Intuitively I would say yes, with Friday being the best day for publishing, as the majority is starting into the weekend and has more time for watching TED talks than from Monday to Thursday. However, when looking at the data, it doesn’t give us reason to assume that there is a favorable day in terms of gaining popularity. Interesting, I was quite confident about my assumption. However, keep in mind that the results might be different for TED talks on YouTube or any other social media platform or different datasets of TED talks. (Side note: I excluded Saturday and Sunday since only very few talks were published on the weekend and they would not have been representative.)

There are many other variables one could explore whether they have an effect on the popularity of talks: the duration, the ratings (attributes like inspiring, fascinating, jaw-dropping, funny etc.), the amount of languages a talk is translated into and the amount of comments. Be careful with the amount of languages and comments though. Most likely there will be a correlation, but it is not sure if it has many comments because it is popular, or it is popular because it has many comments. Probably both since this results in a feedback effect. The same accounts for the number of translated languages.

It is pretty late by now and I am looking forward to my comfortable bed after a long and hard day of procrastination. But first, there is one more thing I am curious about: apart from popularity, what talk/topic raises the most discussion?
For that, we simply divide the amount of comments by the amount of views and see which talks have the highest scores. Interestingly, the talk with the highest score is ‘The case for same-sex marriage’ by Diane J. Savino. It was filmed in 2009 and published in 2010. Same sex marriage still is and especially was a huge point of discussion at the time the talk was published. Most of the topics of the top 10 talks with the highest scores are related to science/faith, religion, consciousness and politics. Well, I guess discussing those topics will never go out of style.

To finally end this day, I ask myself the question: What has my lovely friend, procrastination, taught me today?

  1. The world got infected with the TED fever, that’s for sure
  2. The topics of the talks they are changin’, with some reliable constants like technology
  3. There is no recipe or special ingredient that guarantees popularity of talk, but there are some topics that tend to reach a broader audience
  4. It’s not worth worrying about the weekday your TED talk gets published, at least that’s what this dataset proposes
  5. Procrastination is an evil and very nasty bitch but from time to time it is fun to give in and play along

I wonder what procrastination will teach me tomorrow…

Fun fact: I found a very high correlation (Side note: Pearson correlation coefficient ρ = 0.81) between number of TEDx talks held and deaths due to unintentional suffocation in medical facilities on Sundays in the USA from 2009 to 2017 (Centers for Disease Control & Prevention, Detailed Mortality Data). (Don’t even ask.. I was looking for funny and spurious correlations.) Don’t worry it doesn’t mean we have to stop TEDx events in order to save those people. Lesson number one in stats: correlation is not equal to causality! If you want, you can find a connection in almost everything it just is a matter of the right representation and extraction of data.

By Lisa Weijler