2

Patriarchal Erasure and Manufactured Passivity: Asymmetric Government and News Media Attention to Protests in China

Abstract: Is media attention to protest events gendered, and what is the relationship between gender, media, and protest? Using novel big data from Chinese social media, Weibo, spanning 2010-2017, this study offers the first systematic analysis of gender bias in media selection and description of protests in China and establishes the “gender-protest-media triad.” In accounting for this gender bias, we distinguish between two types of media accounts on Weibo: government and news media outlets. The results indicate that women-majority protests, despite being more violent and risky, are less likely than men-majority protests to receive coverage in both government and news media outlets (media selection bias). Furthermore, when reporting on women-majority protests, government media sources tend to describe them as more passive than men-majority protests (media description bias). Our research establishes the “gender-protest-media triad”: (1) Women participate violently in protests as a reactive response to exploitation and marginalization; (2) Women’s protests are disproportionately underreported and misrepresented in the media; (3) Such patriarchal media bias deprives women protesters of the public attention and resources necessary to pressure institutions for redress of their grievance. This triadic cycle is symptomatic of what we term the “paternalist stability model”: A mode of governance converging patriarchal logics with neo-Confucian stability maintenance, central to the maintenance of patriarchal hegemony in China and throughout the Sinosphere.

Sep 1, 2024

Underrepresentation and Misrepresentation: Selection and Description Bias in Protest Reporting by Government and News Media on Weibo

Mar 1, 2024

Analyzing Image Data with Machine Learning

Dec 25, 2023

Image Clustering: An Unsupervised Approach to Categorize Visual Data in Social Science Research

Abstract: Automated image analysis has received increasing attention in social scientific research, yet existing scholarship has focused on the application of supervised machine learning to classify images into predefined categories. This study focuses on the task of unsupervised image clustering, which automatically finds categories from image data. First, we review the steps to perform image clustering, and then we focus on the key challenge of performing unsupervised image clustering—finding low-dimensional representations of images. We present several methods of extracting low-dimensional representations of images, including the traditional bag-of-visual-words model, self-supervised learning, and transfer learning. We compare these methods using two datasets containing images related to protests in China (from Sina Weibo, Chinese Twitter) and to climate change(from Instagram). Results show that transfer learning significantly outperforms other methods. The dataset used in the pretrained model critically determines what categories algorithms can discover.

Jan 1, 2022

How Using Machine Learning Classification as a Variable in Regression Leads to Attenuation Bias and What to Do About It

Abstract: Social scientists have increasingly been applying machine learning algorithms to “big data” to measure theoretical concepts they cannot easily measure before, and then been using these machine-predicted variables in a regression. This article first demonstrates that directly inserting binary predictions (i.e., classification) without regard for prediction error will generally lead to attenuation biases of either slope coefficients or marginal effect estimates. We then propose several estimators to obtain consistent estimates of coefficients. The estimators require the existence of validation data, of which researchers have both machine prediction and true values.This validation data is either automatically available during training algorithms or can be easily obtained. Monte Carlo simulations demonstrate the effectiveness of the proposed estimators. Finally, we summarize the usage pattern of machine learning predictions in 18 recent publications in top social science journals, apply our proposed estimators to two of them, and offer some practical recommendations.

Jan 1, 2021

Authoritarian Responsiveness and Political Attitudes during COVID-19: Evidence from Weibo and a Survey Experiment

Abstract: How do citizens react to authoritarian responsiveness? To investigate this question, we study how Chinese citizens reacted to a novel government initiative which enabled social media users to publicly post requests for COVID-related medical assistance. To understand the effect of this initiative on public perceptions of government effectiveness, we employ a two-part empirical strategy. First, we conduct a survey experiment in which we directly expose subjects to real help-seeking posts, in which we find that viewing posts did not improve subjects’ ratings of government effectiveness, and in some cases worsened them. Second, we analyze over 10,000 real-world Weibo posts to understand the political orientation of the discourse around help-seekers. We find that negative and politically critical posts far outweighed positive and laudatory posts, complementing our survey experiment results. To contextualize our results, we develop a theoretic framework to understand the effects of different types of responsiveness on citizens’ political attitudes. We suggest that citizens’ negative reactions in this case were primarily influenced by public demands for help, which illuminated existing problems and failures of governance.

Jan 1, 2021

CASM: A Deep Learning Approach for Identifying Collective Action Events with Text and Image Data from Social Media

There are three great invited commentaries to our article by Zachary C. Steinert-Threlkeld, Swen Hutter, and Pamela Oliver. Read them and our response here. Abstract: Protest event analysis is an important method for the study of collective action and social movements and typically draws on traditional media reports as the data source. We introduce collective action from social media (CASM)—a system that uses convolutional neural networks on image data and recurrent neural networks with long short-term memory on text data in a two-stage classifier to identify social media posts about offline collective action. We implement CASM on Chinese social media data and identify more than 100,000 collective action events from 2010 to 2017 (CASM-China). We evaluate the performance of CASM through cross-validation, out-of-sample validation, and comparisons with other protest data sets. We assess the effect of online censorship and find it does not substantially limit our identification of events. Compared to other protest data sets, CASM-China identifies relatively more rural, land-related protests and relatively few collective action events related to ethnic and religious conflict.

Jan 1, 2019

Addressing Selection Bias in Event Studies with General-Purpose Social Media Panels

Abstract: Data from Twitter have been employed in prior research to study the impacts of events. Conventionally, researchers use keyword-based samples of tweets to create a panel of Twitter users who mention event-related keywords during and aer an event. However, the keyword-based sampling is limited in its objectivity dimension of data and information quality. First, the technique suers from selection bias since users who discuss an event are already more likely to discuss event-related topics beforehand. Second, there are no viable control groups for comparison to a keyword-based sample of Twitter users. We propose an alternative sampling approach to construct panels of users defined by their geolocation. Geolocated panels are exogenous to the keywords in users’ tweets, resulting in less selection bias than the keyword panel method. Geolocated panels allow us to follow within-person changes over time and enable the creation of comparison groups. We compare different panels in two real-world settings: response to mass shootings and TV advertising. We show the strength of the selection biases of keyword-panels. Then, we empirically illustrate how geolocated panels reduce selection biases and allow meaningful comparison groups regarding the impact of the studied events. We are the first to provide a clear, empirical example of how a better panel-selection design, based on an exogenous variable such as geography, both reduces selection bias compared to the current state of the art and increases the value of Twitter research for studying events. While we advocate for the use of geolocated panels, we also discuss its weaknesses and application scenario seriously. This paper also calls attention to the importance of selection bias in impacting the objectivity of social media data

May 1, 2018