This semester for a class on HCI, I have been looking at dual screen experience that people interact with mobile devices while watching TV in China. I collected over 1 million Sina Weibo messages during ‘Chunwan’ (春晚), a state sponsored TV program to celebrate Chinese New Year, which is also a popular cultural event that almost everyone in China participates. While the social engagement aspect of this study is still ongoing, one of the side findings from this data is quite worth-noting: Within this data, over 60% users are female on Sina Weibo, but male users are almost twice likely to have a large number of followers(over 5000) than their female counterpart.
Sina Weibo public timeline API returns recently published posts, it also provides user’s gender, user’s number of followers and many other things, so I quickly checked a few descriptive statistics:
Out of 1,090,799 users in my data, 61.6% are female users and 38.4% are male users. In terms of their distribution in different categories of number of followers, we can see from the graph above that for both female and male, most users have followers less than 500, very few users have the number of followers over 5,000. It is probably because the influence of online users often reflects the power structure in real-life. Also Weibo’s realname verification system has made people with better professional positions, celebrities, government officials, scholars and journalists easier have larger number of followers.
It can be seen that there are more male users than female users in the 50,000 — 99,999 category even though the baseline tells us more female users. This made me wonder if it is more likely for men to perform better in the category of very large number of followers. So I decided to check the conditional probabilities of users in different number of followers categories given you are a man or a woman.
This result may confirm my guess that as a man there might be 0.5% chances to have over 100,000 followers, but this chance for women is only 0.3%. I am thinking about a more visually persuasive way presenting the inequality, so here comes this graph below:
The y-axis shows the percentage of men or women that deviates from their respective percentage baselines in different number of followers categories. It is clearer to see that men are more likely than women to occupy the most influential categories while they are also still likely to have smallest number of followers. For example, around 10% of users having over 100, 000 followers should have been women compared with baseline but they are men. These shifted numbers in the 10,000 and 5,000 categories are about 20% and 10% respectively.
Is this gender inequality a particular phenomenon on Sina Weibo? Is it also true on other social media platforms? I was fortunate enough to get help from Soshio, a company that assists me in acquiring data about Chunwan from both Tencent and Sina Weibo. Their data scraped from Sina Weibo confirms the 6:4 female/male user distribution, and the conditional probabilities of female and male users in these categories are list here:
What is the gender presence in Tencent Weibo like? Out of 76730 Tencent Weibo users, 56.3% are male, and 43.7% are female. This is contrary to Sina Weibo’s female dominated gender presence, see the graph below:
Still on Tencent Weibo around 7% or 8% users in the 200–1000 followers category should have been women instead of men. I am not surprised on the fact that Tencent Weibo men occupy more influential positions as they do on Sina Weibo. Also there are significantly less users who have over 5000 followers on Tencent Weibo. One of the reasons might be that most celebrities who would have large number of followers have already chosen Sina Weibo as their primary social media site. Tencent QQ instant messenger’s legacy has attracted many QQ users to their Weibo platform but not necessarily those celebrities.
As I mentioned in the beginning, this result is a side finding from the data we need for another study. This blog post is more exploratory and I hope this preliminary finding could be useful for researchers interested in gender representation for future studies. To gather more representative data, one can randomize the dates when the data is scraped from these weibo platforms. Also if we get data through Weibo public timeline API, we as researchers are not sure how these messages returned to us are sampled. One way to deal with this issue is to have alternative source of data (as I got help from Soshio) to validate the findings.