Topics: Part of Speech Tagging, Stylistics, Authorship Attribution
Readings Due:
- Please review and come to class prepared to discuss reading from the previous weeks, in particular as you see them intersecting with this idea of “conceptualization” or “measuring what matters”:
- Ted Underwood, Distant Horizons “Chapter 4” and “Appendix: Data” (PDF)
- Juola, Patrick. “Authorship Attribution” Foundations and Trends in Information Retrieval. Vol. 1., No. 3 (2006), 233-271. [ PDF in Commons]
- Koolen, Corina and Andreas van Cranenburgh. “These are not the Stereotypes You are Looking For: Bias and Fairness in Authorial Gender Attribution.” Proceedings of the First Workshop on Ethics in Natural Language Processing, Association for Computational Linguistics. Valencia, Spain. 4/4/2017, 19-29. [PDF in Commons]
- Spend time exploring the following projects. What can you learn about their research question, data, and conceptualization? Are the data and the questions appropriately matched? What features of the dataset were selected to create the analysis? Is the method of measurement meaningful?
- Gender in Rate Your Professor: including the associated blog post.
- Dialogue in Disney Movies, by The Pudding
Notebook Activity:
Additional Resources:
- The week 9 notebook follows the CUNY Digital Humanities Research Institute workshop on Text Analysis. I will be posting *improved* notebooks over the next couple of days, but they will all do exactly the same things. If you need additional information in order to complete this assignment, go to the “Introduction to Text Analysis with Python and the Natural Language Toolkit” lesson on GitHub. If you scroll down the page, you’ll see the text of the workshop. Follow the “next” links, and it will guide you through the exercises found in the Week 9 notebook up until the “Catching Fire” activity. If you want to stop there, fine. The next activity includes the first steps to do named entity extraction. I’ll be adding more information into the notebook in the next couple of days. Check back then for an updated copy if you’re really interested in named entity extraction.