Index
A
- antiword tool
- download link / Microsoft Word documents
B
- Bayes' formula
- for conditional probability / Bayes' formula for conditional probability
- bias-variance decomposition / Bias-variance decomposition
- bias-variance trade-off / Bias–variance trade-off and learning curve
- binomial distribution / Binomial distribution
- bootstrapping methods / Bootstrap
C
- canonical correspondence analysis (CCA)
- about / Canonical correspondence analysis
- Pearson's Chi-squared test / Multiple correspondence analysis
- caret package
- reference / Stratified
- chunks
- reference / Chunk tags
- co-occurrences
- extracting / Extracting co-occurrences
- surface co-occurrence / Surface Co-occurrence
- textual co-occurrence / Textual co-occurrence
- syntactic co-occurrence / Syntactic co-occurrence
- in document / Co-occurrence in a document
- collocations / N-gram models
- compound probabilities theorem / Theorem of compound probabilities
- concept similarity
- about / Concept similarity
- path length / Path length
- Resnik similarity / Resnik similarity
- Lin similarity / Lin similarity
- Jiang-Conrath distance / Jiang – Conrath distance
- conditional probability
- about / Conditional probability
- Bayes' formula / Bayes' formula for conditional probability
- confusion matrix
- about / Confusion matrix
- Correlated topic model (CTM)
- about / Correlated topic model
- model selection / Model selection
- R Package, for topic modeling / R Package for topic modeling
- correspondence analysis
- about / Correspondence analysis
- canonical correspondence analysis (CCA) / Canonical correspondence analysis
- multiple correspondence analysis / Multiple correspondence analysis
- cross-validation
- about / Cross validation
- cumulative distribution function / Cumulative distribution function
D
- Degree of Reading Power(DRP) / Automated readability index
- dimensionality
- limitation / The curse of dimensionality
- distance concentration / Distance concentration and computational infeasibility
- computational infeasibility / Distance concentration and computational infeasibility
- dimensionality reduction
- about / Dimensionality reduction
- principal component analysis (PCA) / Principal component analysis
- R, using for principal component analysis (PCA) / Using R for PCA
- reconstruction error / Reconstruction error
- discrete random variables
- about / Discrete random variables
- continuous random variables / Continuous random variables
- diverse sources
- text, accessing / Accessing text from diverse sources
- document clustering
- about / Document clustering
- document term matrix
- about / Document term matrix
- inverse document frequency / Inverse document frequency
- words similarity / Words similarity and edit-distance functions
- edit-distance functions / Words similarity and edit-distance functions
- Euclidean distance / Euclidean distance
- cosine similarity / Cosine similarity
- Levenshtein distance / Levenshtein distance
- Damerau-Levenshtein distance / Damerau-Levenshtein distance
- Hamming distance / Hamming distance
- Gunning frog index / Gunning frog index
E
- .exe from
- download link / Synonymy and similarity
- Easy Listening Formula(ELF) / Automated readability index
- elements
- entities / Feature extraction
- attributes / Feature extraction
- events / Feature extraction
- entity extraction
- about / Entity extraction
- rule-based approach / The rule-based approach
- machine learning / Machine learning
- Extensible Markup Language (XML) / XML
F
- 10-fold cross-validation / k-Fold
- feature extraction
- about / Feature extraction
- synonymy / Synonymy and similarity
- similarity / Synonymy and similarity
- multiwords / Multiwords, negation, and antonymy
- negation / Multiwords, negation, and antonymy
- antonymy / Multiwords, negation, and antonymy
- concept similarity / Concept similarity
- feature selection, for text clustering
- about / Feature selection for text clustering
- mutual information, using / Mutual information
- statistic Chi Square feature selection / Statistic Chi Square feature selection
- frequency-based feature selection / Frequency-based feature selection
- file system
- about / File system
- PDF documents / PDF documents
- Microsoft Word documents / Microsoft Word documents
- Hyper Text Markup Language (HTML) / HTML
- Extensible Markup Language (XML) / XML
- JavaScript Object Notation (JSON) / JSON
- Hypertext Transfer Protocol (HTTP) / HTTP
G
- generative models / Latent Dirichlet Allocation
H
- Hamming distance
- about / Hamming distance
- Jaro-Winkler distance / Jaro-Winkler distance
- text, readability measuring / Measuring readability of a text
- Heaps' laws / Heaps' law
- Hidden Markov Models (HMM), POS tagging
- about / Hidden Markov Models for POS tagging
- definitions / Basic definitions and notations
- notations / Basic definitions and notations
- implementing / Implementing HMMs
- Viterbi underflow / Viterbi underflow
- forward algorithm underflow / Forward algorithm underflow
- OpenNLP chunking / OpenNLP chunking
- chunk tags / Chunk tags
- Hyper Text Markup Language (HTML) / HTML
- Hypertext Transfer Protocol (HTTP) / HTTP
I
- independent events
- for conditional probability / Independent events
- inverse document frequency (IDF) / Inverse document frequency
- Inverse Document Frequency (IDF) / Frequency-based feature selection
- ISOMAP
- using / Implementation of SVD using R
- geodesic distance approximation, calculating / Implementation of SVD using R
J
- JavaScript Object Notation (JSON) / JSON
- Java Virtual Machine (JVM) / Training a model with new features
- joint distribution / Joint distribution
K
- k-fold cross-validation / k-Fold
- kernel functions / Kernel Trick
- Kernel Trick / Kernel Trick
- kernlab
- implementations / Kernel Trick
- reference / Kernel Trick
- koRpus package / koRpus
L
- L-BFGS
- about / Maxent implemenation in R
- language detection
- about / Language detection
- language models
- about / Language models
- N-gram models / N-gram models
- Markov assumption / Markov assumption
- hidden Markov models / Hidden Markov models
- language package
- about / languageR
- languageR package / Lexical richness
- Latent Dirichlet Allocation (LDA) / Latent Dirichlet Allocation
- Latent Semantic Analysis (LSA)
- about / Latent semantic analysis
- R Package / R Package for latent semantic analysis
- example / Illustrative example of LSA
- learning curve
- about / Learning curve
- leave-one-out method / Leave-one-out
- lemma / Word tokenization
- lexical diversity
- about / Lexical diversity
- analyse lexical diversity / Analyse lexical diversity
- calculating / Calculate lexical diversity
- readability / Readability
- automated readability index / Readability
- lexical richness
- about / Lexical richness
- lexical variation / Lexical variation
- lexical density / Lexical density
- lexical originality / Lexical originality
- lexical sophistication / Lexical sophistication
- linear kernel
- applying / How to apply SVM on a real world example?
- linguistics
- quantitative methods / Quantitative methods in linguistics
- lsa package / lsa
M
- Maxent package
- implementing, in R / Maxent implemenation in R
- maxent package / maxent
- maximum entropy classifiers / Number of instances is significantly larger than the number of dimensions.Maximum entropy classifier
- model evaluation
- about / Model evaluation
- confusion matrix / Confusion matrix
- ROC curve / ROC curve
- precision-recall / Precision-recall
- model files
- reference / OpenNLP
- model validation methods
- leave-one-out / Leave-one-out
- k-fold cross-validation / k-Fold
- bootstrapping methods / Bootstrap
- stratified sampling / Stratified
- multi-word expressions (MWE) / Collocation and contingency tables
- multi-word units (MWU) / Collocation and contingency tables
- MySQL software
- download link / Databases
N
- n-fold cross-validation / Leave-one-out
- named entity recognition
- about / Named entity recognition
- model, training with new features / Training a model with new features
- natural language processing (NLP) / Collocation and contingency tables
O
- occurrences
- counting / Counting occurrences
- ODBC Bridge
- download link / Databases
- OpenNLPmodels.language package
- installation link / OpenNLP
- OpenNLP package / OpenNLP
- operations, on document-term matrix
- frequent terms / Operations on a document-term matrix
- term association / Operations on a document-term matrix
- OWLQN
- about / Maxent implemenation in R
P
- part-of-speech (POS) / N-gram models
- pointwise mutual information (PMI) / N-gram models
- poisson distribution / Poisson distribution
- POS tagging
- Hidden Markov Models (HMM) / Hidden Markov Models for POS tagging
- pre-trained POS models, for OpenNLP
- reference / POS tagging with R packages
- pre-trained sentence boundary detection models
- reference / Sentence boundary detection
- precision-recall
- about / Precision-recall
- precompiled binaries
- download link / PDF documents
- principal component analysis (PCA)
- about / Principal component analysis
- R, using / Using R for PCA
- probability
- about / Probability theory and basic statistics
- space / Probability space and event
- probability distributions
- R, using / Probability distributions using R
- probability frequency function / Probability frequency function
Q
- quantitative methods, linguistics
- about / Quantitative methods in linguistics
- document term matrix / Document term matrix
R
- R
- using, for probability distributions / Probability distributions using R
- used, for singular vector decomposition (SVD) implementation / Implementation of SVD using R
- R, using for principal component analysis (PCA)
- about / Using R for PCA
- FactoMineR package / Understanding the FactoMineR package
- Amap package / Amap package
- proportion of variance / Proportion of variance
- scree plot function / Scree plot
- random variables
- about / Random variables
- discrete random variables / Discrete random variables
- RcmdrPlugin.temis package / RcmdrPlugin.temis
- Rcurl / HTTP
- Receiver Operating Characteristics Curve (ROC)
- about / ROC curve
- reducible error components
- dealing with / Dealing with reducible error components
- regular expressions
- used, for processing text / Processing text using regular expressions
- relation between words, quantifying
- about / Quantifying the relation between words
- contingency tables / Contingency tables
- detailed analysis, on textual collocations / Detailed analysis on textual collocations
- RKEA package / RKEA
- R Package, for topic modeling
- about / R Package for topic modeling
- LDA model, fitting with VEM algorithm / Fitting the LDA model with the VEM algorithm
- R packages, text mining
- OpenNLP / OpenNLP
- Rweka / Rweka
- RcmdrPlugin.temis / RcmdrPlugin.temis
- tm / tm
- languageR / languageR
- koRpus / koRpus
- RKEA / RKEA
- maxent / maxent
- lsa / lsa
- R tau package / Counting occurrences
- RTextTools
- about / RTextTools: a text classification framework
- RWeka package / Rweka
S
- segmentation / Tokenization and segmentation
- sensitivity / Confusion matrix
- sentence / Word tokenization
- sentence boundary detection
- about / Sentence boundary detection
- Word token annotator / Word token annotator
- sentence completion feature / Sentence completion
- singular vector decomposition (SVD)
- about / Multiple correspondence analysis
- implementing, with R / Implementation of SVD using R
- speech tagging
- components / Parts of speech tagging
- POS tagging, with R packages / POS tagging with R packages
- state distribution / Hidden Markov models
- statistics
- origin / Probability theory and basic statistics
- strata / Stratified
- stratified sampling / Stratified
- Support vector machines (SVM)
- applying, on real world example / How to apply SVM on a real world example?
T
- table(tags) / POS tagging with R packages
- Term Document Matrix (TDM) / Dimensionality reduction
- term frequency (TF) / Inverse document frequency
- text
- accessing, from diverse sources / Accessing text from diverse sources
- accessing, from file system / File system
- accessing, from databases / Databases
- processing, with regular expressions / Processing text using regular expressions
- tokenization / Tokenization and segmentation
- TextCat / Language detection
- text clustering
- about / Text clustering
- feature selection / Feature selection for text clustering
- text mining
- R packages / R packages for text mining
- texts
- normalizing / Normalizing texts
- lemmatization / Lemmatization and stemming
- stemming / Stemming, Lemmatization
- synonyms / Synonyms
- TF*IDF / Inverse document frequency
- tm package / tm
- tokenization
- about / Tokenization and segmentation
- word tokenization / Word tokenization
- document-term matrix, operations / Operations on a document-term matrix
- sentence segmentation / Sentence segmentation
- tokens / Word tokenization
- topic models
- using / Topic modeling
- Latent Dirichlet Allocation (LDA) / Latent Dirichlet Allocation
- Correlated topic model (CTM) / Correlated topic model
- types / Word tokenization
U
- utterance / Word tokenization
W
- word form / Word tokenization
- word sense / Word tokenization
Z
- Zipf's law / Zipf's law