How to extract data from TikTok

How to extract data from Tiktok

This data was originally featured in the November 10, 2021 newsletter found here:

In this week’s Data Diaries, let’s take a peek at some Tiktok data. Tiktok doesn’t have a public, sanctioned API, so any datasets around it have to be collected from software that crawls and scrapes Tiktok data. A number of enterprising data science enthusiasts have done so; for this look, we’ll be using a dataset published on Kaggle.

As with any exploratory dataset, we first should understand what’s available to us. In this particular Kaggle kernel (a dataset plus code) of the top 1,000 trending videos at the time of capture, we find basic metrics like views, comments, shares, and diggs (likes). We also obtain data like the music being played, the author name, and any accompanying text.

Tiktok features

The key question most marketers will inevitably have when looking at beginning analytics like this is, what outcome should we be aiming for? Generally speaking, with social media channels like Tiktok, our initial efforts should be awareness-based – getting people to even see our content. For that, there are two metrics worth considering. First, we have playCount – the number of times a piece of content is seen. That’s a useful metric, literally describing what we’re after. The second is shareCount, which is the number of times a piece of content has been shared. If we want social media efforts to be effective without having to spend extraordinary amounts of budget and time, we need the help of other people to distribute our content.

For today’s purposes, let’s use shares as our objective. Using data science tools like IBM Watson Studio or Dataiku, we can take all this data and ask the software to build a model that tells us what variables most correlate with the outcome we care about:

Machine learning model of outcomes

What we see from this initial dataset is that comments plus views, followed by comments alone, have the highest correlation with the outcome we care about. Thus, if we’re producing content on Tiktok, we might want to focus our efforts on encouraging comments and see if that then yields an increase in the number of shares, thereby proving causality. After all, it’s entirely possible that reverse causation exists – someone shares it, and that causes people to comment.

What’s missing from this data is any of the more sophisticated feature engineering that might guide our content efforts better, such as what the topic or subject of the video is itself. Because Tiktok is still a relatively new platform with no real, official data, we must rely on gathering the data ourselves and doing this work in lieu of it being provided.

If you’re producing content for Tiktok, let us know how you determine your analytics and content strategy in our free Slack group, Analytics for Marketers!

Methodology: Trust Insights used the Kaggle Tiktok top 1000 trending videos dataset provided by Kaggle. The timeframe of the data is December, 2020. The date of study is November 10, 2021. Trust Insights is the sole sponsor of the study and neither gave nor received compensation for data used, beyond applicable service fees to software vendors, and declares no competing interests.


Need help with your marketing AI and analytics?

You might also enjoy:

Get unique data, analysis, and perspectives on analytics, insights, machine learning, marketing, and AI in the weekly Trust Insights newsletter, INBOX INSIGHTS. Subscribe now for free; new issues every Wednesday!

Click here to subscribe now »

Want to learn more about data, analytics, and insights? Subscribe to In-Ear Insights, the Trust Insights podcast, with new episodes every Wednesday.

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest

Share This