Data Mining Methods for the Content Analyst

Data Mining Methods for the Content Analyst : An Introduction to the Computational Analysis of Content

4.5 (2 ratings by Goodreads)
By (author) 

Free delivery worldwide

Available. Dispatched from the UK in 3 business days
When will my order arrive?


With continuous advancements and an increase in user popularity, data mining technologies serve as an invaluable resource for researchers across a wide range of disciplines in the humanities and social sciences. In this comprehensive guide, author and research scientist Kalev Leetaru introduces the approaches, strategies, and methodologies of current data mining techniques, offering insights for new and experienced users alike.

Designed as an instructive reference to computer-based analysis approaches, each chapter of this resource explains a set of core concepts and analytical data mining strategies, along with detailed examples and steps relating to current data mining practices. Every technique is considered with regard to context, theory of operation and methodological concerns, and focuses on the capabilities and strengths relating to these technologies. In addressing critical methodologies and approaches to automated analytical techniques, this work provides an essential overview to a broad innovative field.
show more

Product details

  • Paperback | 106 pages
  • 152 x 226 x 10mm | 249.47g
  • London, United Kingdom
  • English
  • 6 Tables, black and white
  • 0415895146
  • 9780415895149
  • 1,283,220

Table of contents

Chapter 1 - Introduction

What Is Content Analysis?

Why Use Computerized Analysis Techniques?

Standalone Tools Or Integrated Suites

Transitioning From Theory To Practice

Chapter 2 - Obtaining And Preparing Data

Collecting Data From Digital Text Repositories

Are The Data Meaningful?

Using Data In Unintended Ways

Analytical Resolution

Types Of Data Sources

Finding Sources

Searching Text Collections

Sources Of Incompleteness

Licensing Restrictions And Content Blackouts

Measuring Viewership

Accuracy And Convenience Samples

Random Samples

Multimedia Content

Converting To Textual Format


Example Data Sources

Patterns In Historical War Coverage

Competitive Intelligence

Global News Coverage

Downloading Content

Digital Content

Print Content

Preparing Content

Document Extraction


Post Filtering


Content Proxy Extraction

Chapter 3 - Vocabulary Analysis

The Basics

Word Histograms

Readability Indexes

Normative Comparison

Non-Word Analysis

Colloquialisms: Abbreviations And Slang

Restricting The Analytical Window

Vocabulary Comparison And Evolution / Chronemics

Advanced Topics

Syllables, Rhyming, And `Sounds Like'

Gender And Language

Authorship Attribution

Word Morphology, Stemming, And Lemmatization

Chapter 4 - Correlation And Co-Occurrence

Understanding Correlation

Computing Word Correlations



Co-Occurrence And Search

Language Variation And Lexicons


Correlation With Metadata

Chapter 5 - Lexicons, Entity Extraction, And Geocoding


Lexicons And Categorization

Lexical Correlation

Lexicon Consistency Checks

Thesauri And Vocabulary Expanders

Named Entity Extraction

Lexicons And Processing


Geocoding, Gazetteers, And Spatial Analysis


Gazetteers And The Geocoding Process

Operating Under Uncertainty

Spatial Analysis

Chapter 6 - Topic Extraction

How Machines Process Text

Unstructured Text

Extracting Meaning From Text

Applications Of Topic Extraction

Comparing/Clustering Documents

Automatic Summarization

Automatic Keyword Generation

Multilingual Analysis: Topic Extraction With Multiple Languages

Chapter 7 - Sentiment Analysis

Examining Emotions



Analytical Resolution: Documents vs Objects

Hand-Crafted vs Automatically-Generated Lexicons

Other Sentiment Scales


Measuring Language Rather Than Worldview

Chapter 8 - Similarity, Categorization and Clustering


The Vector-Space Model

Feature Selection

Feature Reduction

Learning Algorithm

Evaluating ATC Results

Benefits of ATC Over Human Categorization

Limitations of ATC

Applications of ATC


Automated Clustering

Hierarchical Clustering

Partitional Clustering

Document Similarity

Vector Space Model

Contingency Tables

Chapter 9 - Network Analysis

Understanding Network Analysis

Network Content Analysis

Representing Network Data

Constructing the Network

Network Structure

The Triad Census

Network Evolution

Visualization and Clustering
show more

About Kalev Leetaru

Kalev Leetaru is Senior Research Scientist for Content Analysis at the University of Illinois Institute for Computing in Humanities, Arts, and Social Science and Center Affiliate of the National Center for Supercomputing Applications. He leads a number of large initiatives centering on the application of high performance computing to grand challenge problems using massive-scale document and data archives.
show more

Rating details

2 ratings
4.5 out of 5 stars
5 50% (1)
4 50% (1)
3 0% (0)
2 0% (0)
1 0% (0)
Book ratings by Goodreads
Goodreads is the world's largest site for readers with over 50 million reviews. We're featuring millions of their reader ratings on our book pages to help you find your new favourite book. Close X