Search icon
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Mastering Text Mining with R

You're reading from  Mastering Text Mining with R

Product type Book
Published in Dec 2016
Publisher Packt
ISBN-13 9781783551811
Pages 258 pages
Edition 1st Edition
Languages
Concepts
Author (1):
KUMAR ASHISH KUMAR ASHISH
Profile icon KUMAR ASHISH

Index

A

  • antiword tool
    • download link / Microsoft Word documents

B

  • Bayes' formula
    • for conditional probability / Bayes' formula for conditional probability
  • bias-variance decomposition / Bias-variance decomposition
  • bias-variance trade-off / Bias–variance trade-off and learning curve
  • binomial distribution / Binomial distribution
  • bootstrapping methods / Bootstrap

C

  • canonical correspondence analysis (CCA)
    • about / Canonical correspondence analysis
    • Pearson's Chi-squared test / Multiple correspondence analysis
  • caret package
    • reference / Stratified
  • chunks
    • reference / Chunk tags
  • co-occurrences
    • extracting / Extracting co-occurrences
    • surface co-occurrence / Surface Co-occurrence
    • textual co-occurrence / Textual co-occurrence
    • syntactic co-occurrence / Syntactic co-occurrence
    • in document / Co-occurrence in a document
  • collocations / N-gram models
  • compound probabilities theorem / Theorem of compound probabilities
  • concept similarity
    • about / Concept similarity
    • path length / Path length
    • Resnik similarity / Resnik similarity
    • Lin similarity / Lin similarity
    • Jiang-Conrath distance / Jiang – Conrath distance
  • conditional probability
    • about / Conditional probability
    • Bayes' formula / Bayes' formula for conditional probability
  • confusion matrix
    • about / Confusion matrix
  • Correlated topic model (CTM)
    • about / Correlated topic model
    • model selection / Model selection
    • R Package, for topic modeling / R Package for topic modeling
  • correspondence analysis
    • about / Correspondence analysis
    • canonical correspondence analysis (CCA) / Canonical correspondence analysis
    • multiple correspondence analysis / Multiple correspondence analysis
  • cross-validation
    • about / Cross validation
  • cumulative distribution function / Cumulative distribution function

D

  • Degree of Reading Power(DRP) / Automated readability index
  • dimensionality
    • limitation / The curse of dimensionality
    • distance concentration / Distance concentration and computational infeasibility
    • computational infeasibility / Distance concentration and computational infeasibility
  • dimensionality reduction
    • about / Dimensionality reduction
    • principal component analysis (PCA) / Principal component analysis
    • R, using for principal component analysis (PCA) / Using R for PCA
    • reconstruction error / Reconstruction error
  • discrete random variables
    • about / Discrete random variables
    • continuous random variables / Continuous random variables
  • diverse sources
    • text, accessing / Accessing text from diverse sources
  • document clustering
    • about / Document clustering
  • document term matrix
    • about / Document term matrix
    • inverse document frequency / Inverse document frequency
    • words similarity / Words similarity and edit-distance functions
    • edit-distance functions / Words similarity and edit-distance functions
    • Euclidean distance / Euclidean distance
    • cosine similarity / Cosine similarity
    • Levenshtein distance / Levenshtein distance
    • Damerau-Levenshtein distance / Damerau-Levenshtein distance
    • Hamming distance / Hamming distance
    • Gunning frog index / Gunning frog index

E

  • .exe from
    • download link / Synonymy and similarity
  • Easy Listening Formula(ELF) / Automated readability index
  • elements
    • entities / Feature extraction
    • attributes / Feature extraction
    • events / Feature extraction
  • entity extraction
    • about / Entity extraction
    • rule-based approach / The rule-based approach
    • machine learning / Machine learning
  • Extensible Markup Language (XML) / XML

F

  • 10-fold cross-validation / k-Fold
  • feature extraction
    • about / Feature extraction
    • synonymy / Synonymy and similarity
    • similarity / Synonymy and similarity
    • multiwords / Multiwords, negation, and antonymy
    • negation / Multiwords, negation, and antonymy
    • antonymy / Multiwords, negation, and antonymy
    • concept similarity / Concept similarity
  • feature selection, for text clustering
    • about / Feature selection for text clustering
    • mutual information, using / Mutual information
    • statistic Chi Square feature selection / Statistic Chi Square feature selection
    • frequency-based feature selection / Frequency-based feature selection
  • file system
    • about / File system
    • PDF documents / PDF documents
    • Microsoft Word documents / Microsoft Word documents
    • Hyper Text Markup Language (HTML) / HTML
    • Extensible Markup Language (XML) / XML
    • JavaScript Object Notation (JSON) / JSON
    • Hypertext Transfer Protocol (HTTP) / HTTP

G

  • generative models / Latent Dirichlet Allocation

H

  • Hamming distance
    • about / Hamming distance
    • Jaro-Winkler distance / Jaro-Winkler distance
    • text, readability measuring / Measuring readability of a text
  • Heaps' laws / Heaps' law
  • Hidden Markov Models (HMM), POS tagging
    • about / Hidden Markov Models for POS tagging
    • definitions / Basic definitions and notations
    • notations / Basic definitions and notations
    • implementing / Implementing HMMs
    • Viterbi underflow / Viterbi underflow
    • forward algorithm underflow / Forward algorithm underflow
    • OpenNLP chunking / OpenNLP chunking
    • chunk tags / Chunk tags
  • Hyper Text Markup Language (HTML) / HTML
  • Hypertext Transfer Protocol (HTTP) / HTTP

I

  • independent events
    • for conditional probability / Independent events
  • inverse document frequency (IDF) / Inverse document frequency
  • Inverse Document Frequency (IDF) / Frequency-based feature selection
  • ISOMAP
    • using / Implementation of SVD using R
    • geodesic distance approximation, calculating / Implementation of SVD using R

J

  • JavaScript Object Notation (JSON) / JSON
  • Java Virtual Machine (JVM) / Training a model with new features
  • joint distribution / Joint distribution

K

  • k-fold cross-validation / k-Fold
  • kernel functions / Kernel Trick
  • Kernel Trick / Kernel Trick
  • kernlab
    • implementations / Kernel Trick
    • reference / Kernel Trick
  • koRpus package / koRpus

L

  • L-BFGS
    • about / Maxent implemenation in R
  • language detection
    • about / Language detection
  • language models
    • about / Language models
    • N-gram models / N-gram models
    • Markov assumption / Markov assumption
    • hidden Markov models / Hidden Markov models
  • language package
    • about / languageR
  • languageR package / Lexical richness
  • Latent Dirichlet Allocation (LDA) / Latent Dirichlet Allocation
  • Latent Semantic Analysis (LSA)
    • about / Latent semantic analysis
    • R Package / R Package for latent semantic analysis
    • example / Illustrative example of LSA
  • learning curve
    • about / Learning curve
  • leave-one-out method / Leave-one-out
  • lemma / Word tokenization
  • lexical diversity
    • about / Lexical diversity
    • analyse lexical diversity / Analyse lexical diversity
    • calculating / Calculate lexical diversity
    • readability / Readability
    • automated readability index / Readability
  • lexical richness
    • about / Lexical richness
    • lexical variation / Lexical variation
    • lexical density / Lexical density
    • lexical originality / Lexical originality
    • lexical sophistication / Lexical sophistication
  • linear kernel
    • applying / How to apply SVM on a real world example?
  • linguistics
    • quantitative methods / Quantitative methods in linguistics
  • lsa package / lsa

M

  • Maxent package
    • implementing, in R / Maxent implemenation in R
  • maxent package / maxent
  • maximum entropy classifiers / Number of instances is significantly larger than the number of dimensions.Maximum entropy classifier
  • model evaluation
    • about / Model evaluation
    • confusion matrix / Confusion matrix
    • ROC curve / ROC curve
    • precision-recall / Precision-recall
  • model files
    • reference / OpenNLP
  • model validation methods
    • leave-one-out / Leave-one-out
    • k-fold cross-validation / k-Fold
    • bootstrapping methods / Bootstrap
    • stratified sampling / Stratified
  • multi-word expressions (MWE) / Collocation and contingency tables
  • multi-word units (MWU) / Collocation and contingency tables
  • MySQL software
    • download link / Databases

N

  • n-fold cross-validation / Leave-one-out
  • named entity recognition
    • about / Named entity recognition
    • model, training with new features / Training a model with new features
  • natural language processing (NLP) / Collocation and contingency tables

O

  • occurrences
    • counting / Counting occurrences
  • ODBC Bridge
    • download link / Databases
  • OpenNLPmodels.language package
    • installation link / OpenNLP
  • OpenNLP package / OpenNLP
  • operations, on document-term matrix
    • frequent terms / Operations on a document-term matrix
    • term association / Operations on a document-term matrix
  • OWLQN
    • about / Maxent implemenation in R

P

  • part-of-speech (POS) / N-gram models
  • pointwise mutual information (PMI) / N-gram models
  • poisson distribution / Poisson distribution
  • POS tagging
    • Hidden Markov Models (HMM) / Hidden Markov Models for POS tagging
  • pre-trained POS models, for OpenNLP
    • reference / POS tagging with R packages
  • pre-trained sentence boundary detection models
    • reference / Sentence boundary detection
  • precision-recall
    • about / Precision-recall
  • precompiled binaries
    • download link / PDF documents
  • principal component analysis (PCA)
    • about / Principal component analysis
    • R, using / Using R for PCA
  • probability
    • about / Probability theory and basic statistics
    • space / Probability space and event
  • probability distributions
    • R, using / Probability distributions using R
  • probability frequency function / Probability frequency function

Q

  • quantitative methods, linguistics
    • about / Quantitative methods in linguistics
    • document term matrix / Document term matrix

R

  • R
    • using, for probability distributions / Probability distributions using R
    • used, for singular vector decomposition (SVD) implementation / Implementation of SVD using R
  • R, using for principal component analysis (PCA)
    • about / Using R for PCA
    • FactoMineR package / Understanding the FactoMineR package
    • Amap package / Amap package
    • proportion of variance / Proportion of variance
    • scree plot function / Scree plot
  • random variables
    • about / Random variables
    • discrete random variables / Discrete random variables
  • RcmdrPlugin.temis package / RcmdrPlugin.temis
  • Rcurl / HTTP
  • Receiver Operating Characteristics Curve (ROC)
    • about / ROC curve
  • reducible error components
    • dealing with / Dealing with reducible error components
  • regular expressions
    • used, for processing text / Processing text using regular expressions
  • relation between words, quantifying
    • about / Quantifying the relation between words
    • contingency tables / Contingency tables
    • detailed analysis, on textual collocations / Detailed analysis on textual collocations
  • RKEA package / RKEA
  • R Package, for topic modeling
    • about / R Package for topic modeling
    • LDA model, fitting with VEM algorithm / Fitting the LDA model with the VEM algorithm
  • R packages, text mining
    • OpenNLP / OpenNLP
    • Rweka / Rweka
    • RcmdrPlugin.temis / RcmdrPlugin.temis
    • tm / tm
    • languageR / languageR
    • koRpus / koRpus
    • RKEA / RKEA
    • maxent / maxent
    • lsa / lsa
  • R tau package / Counting occurrences
  • RTextTools
    • about / RTextTools: a text classification framework
  • RWeka package / Rweka

S

  • segmentation / Tokenization and segmentation
  • sensitivity / Confusion matrix
  • sentence / Word tokenization
  • sentence boundary detection
    • about / Sentence boundary detection
    • Word token annotator / Word token annotator
  • sentence completion feature / Sentence completion
  • singular vector decomposition (SVD)
    • about / Multiple correspondence analysis
    • implementing, with R / Implementation of SVD using R
  • speech tagging
    • components / Parts of speech tagging
    • POS tagging, with R packages / POS tagging with R packages
  • state distribution / Hidden Markov models
  • statistics
    • origin / Probability theory and basic statistics
  • strata / Stratified
  • stratified sampling / Stratified
  • Support vector machines (SVM)
    • applying, on real world example / How to apply SVM on a real world example?

T

  • table(tags) / POS tagging with R packages
  • Term Document Matrix (TDM) / Dimensionality reduction
  • term frequency (TF) / Inverse document frequency
  • text
    • accessing, from diverse sources / Accessing text from diverse sources
    • accessing, from file system / File system
    • accessing, from databases / Databases
    • processing, with regular expressions / Processing text using regular expressions
    • tokenization / Tokenization and segmentation
  • TextCat / Language detection
  • text clustering
    • about / Text clustering
    • feature selection / Feature selection for text clustering
  • text mining
    • R packages / R packages for text mining
  • texts
    • normalizing / Normalizing texts
    • lemmatization / Lemmatization and stemming
    • stemming / Stemming, Lemmatization
    • synonyms / Synonyms
  • TF*IDF / Inverse document frequency
  • tm package / tm
  • tokenization
    • about / Tokenization and segmentation
    • word tokenization / Word tokenization
    • document-term matrix, operations / Operations on a document-term matrix
    • sentence segmentation / Sentence segmentation
  • tokens / Word tokenization
  • topic models
    • using / Topic modeling
    • Latent Dirichlet Allocation (LDA) / Latent Dirichlet Allocation
    • Correlated topic model (CTM) / Correlated topic model
  • types / Word tokenization

U

  • utterance / Word tokenization

W

  • word form / Word tokenization
  • word sense / Word tokenization

Z

  • Zipf's law / Zipf's law
lock icon The rest of the chapter is locked
arrow left Previous Chapter
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}