Text Analysis & Data Mining

The list of tools below is organized alphabetically, and it represents a selection of the resources available to Digital Humanists. Many of these tools are actively updated, so please contact the DH@Bucknell Web Team if you find any outdated information or if you would like to suggest additional tools or software.

Bucknell University has site licenses and provides faculty, staff, and students with access to and support for a number of these tools; tools for which this is the case have “BU access” listed under pricing.


AntConc

A free corpus analysis toolkit for concordancing and text analysis.

Details

Website: http://www.laurenceanthony.net/software/antconc/
Open Source Software (OSS) or Proprietary? OSS
Pricing: Free


Bibliopedia

Bibliopedia provides a platform for organizing, visualizing, sharing, and searching archives without the need for scholars to become experts in metadata or data visualization. It transforms materials into visualized networks to provide new insights into their structure and context. Bibliopedia allows scholars to collaborate on the elaboration and improvement of these materials, is useful for active research, and serves as a gateway to long-term preservation and dissemination.

Details

Website: http://sul-cidr.github.io/Bibliopedia/
Open Source Software (OSS) or Proprietary? OSS
Pricing: Free


CLAWS Grammatical Parts-of-Speech Tagger

CLAWS is a web tagging service that allows you to input text that will then be tagged according to its grammatical function within a given sentence.

Details

Website: http://ucrel-api.lancaster.ac.uk/claws/free.html
Open Source Software (OSS) or Proprietary? Proprietary
Pricing: Free up to 100,000 words; pricing tiers for additional words and services


Computer Assisted Text Markup and Analysis (CATMA)

CATMA (Computer Assisted Text Markup and Analysis) is a practical and intuitive tool for text researchers. CATMA supports text research from quantitative to qualitative analysis, from text interpretation to annotation and back to further analysis. In particular, CATMA can assist with the analysis, annotation, modeling, and visualization of texts.

Details

Website: https://catma.de/
Open Source Software (OSS) or Proprietary? OSS
Pricing: Free


DataBasic

DataBasic is a suite of easy-to-use web tools for beginners that introduce concepts of working with data. Tools include a WordCounter, which analyzes your text and tells you the most common words and phrases, SameDiff, which compares two or more text files and tells you how similar or different they are, and ConnectTheDots, which shows you how your data is connected by analyzing it as a network.

Details

Website: https://databasic.io/en/
Open Source Software (OSS) or Proprietary? OSS
Pricing: Free


Google Ngram Viewer

Google Ngram Viewer uses data from Google Books to visualize the frequency of keywords over time.

Details

Website: https://books.google.com/ngrams
Open Source Software (OSS) or Proprietary? Proprietary
Pricing: Free


HathiTrust Research Center Analytics

HathiTrust Research Center (HTRC) enables computational analysis of works in the HathiTrust Digital Library (HTDL) to facilitate non-profit research and educational uses of the collection. HTRC engages in research and development for computational text analysis of massive digital libraries, and it creates and maintains a suite of tools and services for text-based, data-driven research, such as HTRC Algorithms and Data Capsule, and engages in cutting-edge research on large-scale data analysis.

Details

Website: https://analytics.hathitrust.org/
Open Source Software (OSS) or Proprietary? Proprietary
Pricing: Free


Linguistic Inquiry and Word Count (LIWC)

LIWC reads a given text and counts the percentage of words that reflect different emotions, thinking styles, social concerns, and even parts of speech. Because LIWC was developed by researchers with interests in social, clinical, health, and cognitive psychology, the language categories were created to capture people’s social and psychological states.

Details

Website: http://liwc.wpengine.com/
Open Source Software (OSS) or Proprietary? Proprietary
Pricing: Pricing tiers


MALLET

The Machine Learning for LanguagE Toolkit, or MALLET, is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

Details

Website: http://mallet.cs.umass.edu/
Open Source Software (OSS) or Proprietary? OSS
Pricing: Free


TopicGraph

TopicGraph is a beta tool built by JSTOR Labs as part of its Reimagining the Monograph project. It helps researchers explore scholarly books by letting them understand at a glance all the topics covered within a book and then navigate directly to those pages about topics they are researching.

Details

Website: https://labs.jstor.org/topicgraph/
Open Source Software (OSS) or Proprietary? Proprietary
Pricing: Free


Ushahidi

Ushahidi is a social enterprise that provides software and services to numerous sectors and civil society to help improve the bottom up flow of information. Ushahidi, which translates to “testimony” in Swahili, was developed to map reports of violence in Kenya after the post-election violence in 2008. Since then, thousands have used the crowdsourcing tools provided by Ushahidi to raise their voice in cities around the world. Ushahidi assists with data collection, management, and visualization.

Details

Website: https://www.ushahidi.com/
Open Source Software (OSS) or Proprietary? Proprietary
Pricing: Pricing tiers


Voyant Tools

Voyant Tools is a web-based reading and analysis environment for digital texts. It is a scholarly project that is designed to facilitate reading and interpretive practices for digital humanities students and scholars as well as for the general public.

Details

Website: https://voyant-tools.org/
Open Source Software (OSS) or Proprietary? OSS
Pricing: Free


WordSeer

WordSeer is a text analysis environment that combines visualization, information retrieval, sense-making and natural language processing to make the contents of text navigable, accessible, and useful.

Details

Website: http://wordseer.berkeley.edu/
Open Source Software (OSS) or Proprietary? OSS
Pricing: Free