Shen Yan Shun, Lucas

Empirical Economist


Currently a PhD candidate in Economics at the Nanyang Technological University. I use machine learning and natural language processing within the standard applied econometric analyses. I use Stata for standard applied econometrics and Python for everything else.

Working papers

Measuring political media slant using textual data: evidence from Singapore

[Draft, Data appendix]

In this paper I explore the use of text processing, machine learning (ML), and natural language processing (NLP) to assemble a panel of direct quotations of the Singapore parliamentary speeches that appear in The Straits Times. In particular, I use NLP methods to generate measures of coverage accuracy, and an unsupervised ML method (Latent Dirichlet Allocation) to generate controls for the topical content of the political speeches and the news articles that report them. Conditional on the observables, I find the coverage of the opposition speeches to be less accurate than those of the ruling party speeches. In addition to the usual robustness tests, I also provide evidence against the possibility that the observed differences in coverage accuracy occurs because of differences in language competency. While the finding that the opposition receives less accurate coverage cannot be unambiguously interpreted as causal, I also provide two arguments suggesting that the estimates in this paper are lower bounds on the true magnitude of the opposition status on political media coverage. To the best of my knowledge, this paper is the first that attempts to detect media slant by focusing on coverage accuracy instead of intensity.

The promise of board gender diversity?

I exploit the plausibly exogeneous increase in women representation in the parliament of Singapore in the period 2000–17, and find that this increase predicts a similar increase in the women representation on top SGX-listed corporate board of directors, but only for those firms that have close government ties—the government-linked companies (GLCs). I interpret this finding as one where the corporate sector takes cues from the government on representation issues, and it is the GLCs that respond more to these cues. I then use the above findings as a first-stage in a 2SLS analyses to identify the causal effect of higher women board representation on tangible firm outcomes such as firm value and leverage.

Did the metoo movement predict the performance of the women in the 2018 US elections?

In this paper I download all tweets containing the metoo hashtag in the year 2018 leading up to the 2018 US midterm elections in November. Parsing the geolocation tags of the tweeter users, I construct at the US county level the density of the metoo tweets in 2018, and test whether this density can predict the historical performance of the women candidates in in the 2018 US House elections.


I use Python for my research work, and have benefitted extensively from open-source libraries in the Python ecosystem. As a tiny contribution, I wrote Leixcal richness which I use to generate proxies for language sophistication in my research.