Over at
JOTWELL,
Pat Gudridge (University of Miami School of Law) has shined a spotlight on a student Note by
Daniel Taylor Young, titled, "
How
Do You Measure a Constitutional Moment? Using Algorithmic Topic
Modeling To Evaluate Bruce Ackerman’s Theory of Constitutional Change," which appeared in Volume 122 of the
Yale Law Journal (2013). Here's the abstract:
Bruce Ackerman argues that major shifts in constitutional law can occur
outside the Article V amendment process when there are unusually high
levels of sustained popular attention to questions of constitutional
significance. This Note develops a new empirical strategy to evaluate
this claim using the debate over ratification of the Fourteenth
Amendment as its test case. The Note applies a statistical process
known as unsupervised topic modeling to a dataset containing over 19,000
pages of text from U.S. newspapers published between 1866 and 1884.
This innovative methodological technique illuminates the structure of
constitutional discourse during this period. The Note finds empirical
support for the notion that the salience of constitutional issues was
high throughout the ratification debate and then gradually declined as
the country returned to a period of normal politics. These findings
buttress Ackerman’s cyclic theory of constitutional change at one of its
more vulnerable points.
And here's a snippet of Gudridge's
essay:
“[F]or all the millions of words and thousands of newspaper articles
this Note analyzes,” Mr. Young concedes, “this is a rather modest
conclusion.” “[T]here is nothing surprising about the fact that the
media was paying attention to the passage of major constitutional
amendments in the aftermath of a devastating civil war.” (P. 2053.) It’s
not Young’s bottom line, however, that marks his effort as important. “[M]illions of words and thousands of newspaper articles”—no law student reads this much! How did he do that?
“Algorithmic topic modeling,” his Note’s title tersely declares.
Forty pages plus (out of 54 total) admirably explain what this involves.
There is also an elegant technical appendix.
Each newspaper front page from the period (all accessible on line) is
treated as a separate document and run through optical character
recognition software to identify words as words. The documents are
computer-converted to brute lists stripped of all original interior
organization, so-called common words deleted; the remaining identified
words are counted in cases of repetition within each of the documents.
The quantified word lists are statistically analyzed (more software) as
word distributions, compared with each other, and the most common
clusters of words across the full set of documents extracted. These
clusters provide the ultimate working material for purposes of Young’s
discussion. Texts become data.
I'm curious as to what historians think about this kind of analysis. Thoughts from readers?