Introduction to Text as Data – January 2024

Event Phone: 1-610-715-0115

We're sorry, but all tickets sales have ended because the event is expired.

There are no upcoming dates for this event.


Cancellation Policy: If you cancel your registration at least two weeks before the course is scheduled to begin, you are entitled to a full refund (minus a processing fee of $50).
In the unlikely event that Statistical Horizons LLC must cancel a seminar, we will do our best to inform you as soon as possible of the cancellation. You would then have the option of receiving a full refund of the seminar fee or a credit towards another seminar. In no event shall Statistical Horizons LLC be liable for any incidental or consequential damages that you may incur because of the cancellation.
A 3-Day Livestream Seminar Taught by  Amber Boydstun, Ph.D and Cory Struthers, Ph.D

Text is all around us: from archived court documents to this morning’s social media posts, from transcripts of political ads to terrorist manifestos. Text-as-data methods allow us to use this text to measure and discover phenomena that may be otherwise hard or impossible to represent quantitatively, such as ideological positions of court documents and emotional sentiment in manifestos.

There has never been a more exciting time to learn text-as-data methods. Digital advances have made available text content that even a few years ago would have been difficult to collect and computational text-as-data methods have advanced just as fast. However, because there are now countless text data to explore and a dizzying array of accessible text-as-data tools to apply, understanding which methods are appropriate for what contexts is critically important.

This course will provide an introduction to text-as-data methods, including how they work, how they can be applied, and common pitfalls to avoid. We will focus on linking concepts to measurement through textual data. Topics covered include: manual content analysis; text collection and pre-processing; advanced keyword queries and frequencies; dictionary analysis (including sentiment analysis); text similarity and reuse; topic modeling; and supervised machine learning.

This seminar provides an intensive introduction to text-as-data methods, drawing on social science research and perspectives.

We will begin with an overview of text-as-data methods, highlighting the range of applications they make possible. We will ground this discussion in classic “manual content analysis” methods, which remain the gold standard for validating computational approaches.

Next, we will move on to an overview of how to pre-process a text dataset, known as a corpus (plural=corpora). Then we will examine core text-as-data techniques for which “off the shelf” code exists: advanced keyword queries and frequencies, dictionary methods (including sentiment analysis), text similarity and reuse, and topic modeling.

Along the way, we will discuss (but not cover in detail) more advanced text-as-data methods that require additional data and/or expertise but that also open up additional avenues of research.

Here are some of the things you will be able to do by the end of this course:

  • Develop a content analysis codebook.
  • Pre-process text for analysis.
  • Calculate frequencies of key words or phrases in a corpus.
  • Evaluate the sentiment of a corpus.
  • Apply dictionary methods to a corpus.
  • Identify topics in a corpus.
  • Have the foundational knowledge to learn more about advanced text analysis methods.

Venue: