Search icon
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Mastering Data Mining with Python - Find patterns hidden in your data

You're reading from  Mastering Data Mining with Python - Find patterns hidden in your data

Product type Book
Published in Aug 2016
Publisher
ISBN-13 9781785889950
Pages 268 pages
Edition 1st Edition
Languages
Concepts
Author (1):
Megan Squire Megan Squire
Profile icon Megan Squire

Index

A

  • abstractive summarization
    • about / What is automatic text summarization?
  • accuracy
    • about / Effectiveness – how accurate are the matches that we generate?
  • adjacency list
    • about / Edge lists and adjacency lists
  • adjacency list format
    • about / Adjacency list format
  • adjacency matrix
    • about / Adjacency matrix
  • Anaconda Python distribution
    • download link / How do we set up our data mining work environment?
  • annotated corpus
    • about / Tagging parts of speech
  • anomaly
    • about / What are data anomalies?
  • antecedent
    • about / Association rules
  • Apache
    • references / Apache Board meeting minutes
  • Apache board meeting minutes
    • about / Apache Board meeting minutes
  • apache_twitter
    • about / Locating missing data
  • Apriori
    • about / Methods for finding frequent itemsets
  • aspects
    • about / The structure of an opinion
  • association rules
    • about / Towards association rules, Association rules
    • metrics / Towards association rules
    • support / Support
    • confidence / Confidence
    • data, example / An example with data
    • value - fixing flaw, adding / Added value – fixing a flaw in the plan
    • frequent itemsets, finding methods / Methods for finding frequent itemsets
    • discovering, in software project tags / A project – discovering association rules in software project tags
  • atomic
    • about / Merging data
  • attribute-based similarity matching
    • about / Attribute-based similarity matching
    • pairwise comparisons / Be careful of pairwise comparisons
    • rare values, leveraging / Leverage rare values
  • attributes
    • about / The structure of an opinion
  • attributes matching, methods
    • about / Methods for matching attributes
    • range-based, from target / Range-based or distance from target
    • distance, from target / Range-based or distance from target
    • string edit distance / String edit distance
    • Hamming distance / Hamming distance
    • Levenshtein distance / Levenshtein distance
    • Soundex / Soundex
  • attributes of edges
    • about / What is a network?
  • attributes of nodes
    • about / What is a network?
  • automatic text summarization
    • about / What is automatic text summarization?
  • automatic text summarization techniques
    • reference / What is automatic text summarization?

B

  • bag of words
    • about / Important features of opinions
  • bag of words (bow)
    • about / Gensim for topic modeling
  • betweenness centrality
    • about / Betweenness centrality
  • big data
    • about / What is data mining?
  • blank data values
    • about / Missing data
  • blocking methods
    • about / Efficiency – how long does it take to do the matching?
  • bonus words
    • about / Sumy's Edmundson summarizer
  • boundary errors
    • about / NER and partial matches
  • box-and-whisker plot
    • about / Detecting outliers by combining statistics and visual mining
  • boxplot
    • about / Detecting outliers by combining statistics and visual mining
    • reference link / Detecting outliers by combining statistics and visual mining
  • Brown Corpus
    • about / Tagging parts of speech
    • reference link / Tagging parts of speech

C

  • CamelCase
    • about / Why look for named entities?
  • change detection problems
    • about / What are the techniques used in data mining?
  • check constraint
    • about / Logic or semantic errors
  • classification problems
    • about / What are the techniques used in data mining?
  • closed path
    • about / Walks, paths, and trails in a network
  • closeness centrality
    • about / Closeness centrality
  • clustering-based outlier
    • reference link / Detecting outliers with machine learning
  • clustering problems
    • about / What are the techniques used in data mining?
  • code, entity matching project
    • about / The code
    • reference / The code
  • code and text files, NER project
    • reference link / A simple NER tool
  • coding
    • about / General-purpose data collections
  • components
    • about / The structure of an opinion
  • compound score
    • URL / Data analysis of chat messages
  • confidence,association rules
    • about / Confidence
  • consequent
    • about / Association rules
  • context-based similarity matching
    • about / Context-based similarity matching
  • corpus
    • about / Tagging parts of speech
  • CREATE and INSERT statements
    • URL / Data analysis of e-mail messages
  • CRISP-DM process
    • about / The CRISP-DM process
    • business understanding / The CRISP-DM process
    • data understanding / The CRISP-DM process
    • data preparation / The CRISP-DM process
    • modeling / The CRISP-DM process
    • evaluation / The CRISP-DM process
    • deployment / The CRISP-DM process
  • cues
    • about / Sumy's Edmundson summarizer

D

  • data
    • merging / Merging data
    • sets, merging vertically / Merging datasets vertically
    • sets, merging horizontally / Merging datasets horizontally
    • exploring / Exploring the data
  • data, exploring
    • datasources table / Exploring the data
    • rf_developer_projects table / Exploring the data
  • data, importing into graph structure
    • about / Importing data into a graph structure
    • adjacency list format / Adjacency list format
    • edge list format / Edge list format
    • GEXF format / GEXF and GraphML
    • GraphML format / GEXF and GraphML
    • graph data format (GDF) / GDF
    • Graph Data Format (GDF) / GDF
    • Python pickle / Python pickle
    • JavaScript Serialized Object Notation (JSON) / JSON
    • JSON node series / JSON node and link series
    • JSON link series / JSON node and link series
    • JSON trees / JSON trees
    • Pajek format / Pajek format
  • data, social network
    • simple network metrics, generating / Generating simple network metrics
    • network parameters / Playing with the parameters of a network
    • subgraphs, analyzing / Analyzing subgraphs
    • cliques, analyzing / Analyzing cliques and centrality in the subgraphs
    • centrality in subgraphs, analyzing / Analyzing cliques and centrality in the subgraphs
    • change over time, finding / Looking for change over time
  • data anomalies
    • about / What are data anomalies?
    • missing data / Missing data
    • missing data, fixing / Fixing missing data
    • data errors / Data errors
    • outliers / Outliers
  • data append
    • about / Merging datasets vertically
  • data errors
    • about / Data errors
    • truncated fields / Truncated fields
    • data type errors / Data type and character set errors
    • character set errors / Data type and character set errors
    • logic errors / Logic or semantic errors
    • semantic errors / Logic or semantic errors
  • data file
    • URL / A project – discovering association rules in software project tags
  • datafiles
    • reference link / Generating the network files
  • data mining
    • about / What is data mining?
    • machine learning / What is data mining?
    • predictive analytics / What is data mining?
    • big data / What is data mining?
    • data science / What is data mining?
    • performing / How do we do data mining?
    • Fayyad et al. KDD process / The Fayyad et al. KDD process
    • Han et al. KDD process / The Han et al. KDD process
    • CRISP-DM process / The CRISP-DM process
    • Six Steps process / The Six Steps process
    • methodology / Which data mining methodology is the best?
    • techniques / What are the techniques used in data mining?, What techniques are we going to use in this book?
    • development environment, setting up / How do we set up our data mining work environment?
  • data quality
    • about / Merging data
  • data science
    • about / What is data mining?
  • dataset, entity matching project
    • about / The dataset
  • datasources table
    • datasource_id / Exploring the data
    • date_donated / Exploring the data
    • comments / Exploring the data
  • data type errors
    • example / Data type and character set errors
  • degree
    • about / Degree of a network
  • degree centrality
    • about / Degree centrality
  • dependency modeling problems
    • about / What are the techniques used in data mining?
  • details
    • about / Locating missing data
  • developer channel, Ubuntu
    • reference for text archive / Data preparation
  • deviation detection problems
    • about / What are the techniques used in data mining?
  • diameter
    • about / Diameter of a network
  • directed network
    • about / What is a network?
  • direction
    • about / What is a network?
  • disjoint sets
    • leveraging / Leveraging disjoint sets
    • about / Leveraging disjoint sets
  • distance
    • about / Diameter of a network
  • Django IRC chat
    • about / Django IRC chat
    • reference link / Django IRC chat
  • doc2bow()
    • about / Gensim for topic modeling
  • document level
    • about / Document-level and sentence-level analysis
  • domain
    • about / Frequent itemset mining basics
  • domain knowledge
    • about / What is entity matching?
  • doubletons
    • about / Frequent itemset mining basics

E

  • edge list
    • about / Edge lists and adjacency lists
  • edge list format
    • about / Edge list format
  • edges
    • about / What is a network?
  • entity
    • about / The structure of an opinion
  • entity matching
    • about / What is entity matching?
    • data, merging / Merging data
    • techniques / Techniques for matching
    • attribute-based similarity matching / Attribute-based similarity matching
    • attributes matching, methods / Methods for matching attributes
    • disjoint sets, leveraging / Leveraging disjoint sets
    • context-based similarity matching / Context-based similarity matching
    • machine learning based entity matching / Machine learning-based entity matching
  • entity matching project
    • about / Entity matching project
    • difficulties, with matching software projects / Difficulties with matching software projects
    • project names, matching / Matching on project names
    • people names, matching / Matching on people names
    • URLs, matching / Matching on URLs
    • topics and description keywords, matching / Matching on topics and description keywords
    • dataset / The dataset
    • code / The code
    • results / The results
  • entity matching techniques
    • efficiency / Efficiency – how long does it take to do the matching?
    • effectiveness / Effectiveness – how accurate are the matches that we generate?
    • usefulness / Usefulness – how practical is the matching procedure to use?
  • errors
    • about / What are data anomalies?
  • explicit
    • about / The structure of an opinion
  • extractive method
    • about / What is automatic text summarization?

F

  • Facebook Research blog
    • download link / What is topic modeling?
  • Fayyad et al. KDD process
    • data selection / The Fayyad et al. KDD process
    • data pre-processing / The Fayyad et al. KDD process
    • data transformation / The Fayyad et al. KDD process
    • data mining / The Fayyad et al. KDD process
    • data interpretation / The Fayyad et al. KDD process
    • data evaluation / The Fayyad et al. KDD process
  • feature engineering
    • about / Sentiment analysis algorithms
  • flaccid designator
    • about / Techniques for named entity recognition
  • fliers
    • about / Detecting outliers by combining statistics and visual mining
    • reference link / Detecting outliers by combining statistics and visual mining
  • FLOSSmole
    • URL / A project – discovering association rules in software project tags
    • reference link / GnuIRC summaries
  • FLOSSmole.org
    • references / Exploring the data
  • FLOSSmole data
    • about / The dataset
    • database tables / The dataset
  • FLOSSmole project
    • URL / The dataset
  • frequent itemsets
    • about / What are frequent itemsets?
    • diapers and beer urban legend example / The diapers and beer urban legend
    • mining basics / Frequent itemset mining basics

G

  • gazetteer
    • about / Why look for named entities?
  • GDF format
    • reference link / GDF
  • general-purpose data collections
    • Hu and Liu's sentiment analysis lexicon / Hu and Liu's sentiment analysis lexicon
    • SentiWordNet / SentiWordNet
    • Vader sentiment / Vader sentiment
  • generalizable
    • about / Usefulness – how practical is the matching procedure to use?
  • general user channel, Ubuntu
    • reference for text archive / Data preparation
  • Gensim
    • about / How do we set up our data mining work environment?
    • used, for text summarization / Text summarization using Gensim
    • used, for topic modeling / Gensim for topic modeling
  • Gensim approach
    • reference / Text summarization using Gensim
  • Gensim changelog
    • reference / Text summarization using Gensim
  • Gensim documentation
    • reference link / Serializing a corpus
  • Gensim LDA
    • download link / Latent Dirichlet Allocation
    • larger project / Gensim LDA for a larger project
  • Gensim LDA model
    • applying, to documents / Applying a Gensim LDA model to new documents
  • Gensim LDA objects
    • serializing / Serializing Gensim LDA objects
    • dictionary, serializing / Serializing a dictionary
    • corpus, serializing / Serializing a corpus
    • model, serializing / Serializing a model
  • Gensim LDA passes
    • about / Understanding Gensim LDA passes
  • Gensim LDA topics
    • about / Understanding Gensim LDA topics
    • example / Understanding Gensim LDA topics
  • GEXF format
    • about / GEXF and GraphML
  • glosses
    • about / SentiWordNet
  • gnueIRCsummary.txt
    • reference link / GnuIRC summaries
  • GnuIRC summaries
    • about / GnuIRC summaries
  • graph
    • about / What is a network?
  • graph data
    • representing / Representing graph data
  • graph data, representing
    • adjacency matrix / Adjacency matrix
    • edge list / Edge lists and adjacency lists
    • adjacency list / Edge lists and adjacency lists
    • graph data structures, differences / Differences between graph data structures
    • data, importing into graph structure / Importing data into a graph structure
  • graph data format (GDF)
    • about / GDF
  • GraphML format
    • about / GEXF and GraphML
  • graph trail
    • about / Walks, paths, and trails in a network
  • graph walk
    • about / Walks, paths, and trails in a network
  • Grubbs' test
    • about / Detecting outliers with modified z-scores
  • gzipped
    • download link / The dataset

H

  • Hamming distance
    • about / Hamming distance
  • Han et al. KDD process
    • data cleaning / The Han et al. KDD process
    • data integration / The Han et al. KDD process
    • data selection / The Han et al. KDD process
    • data transformation / The Han et al. KDD process
    • data mining / The Han et al. KDD process
    • pattern evaluation / The Han et al. KDD process
    • knowledge representation / The Han et al. KDD process
  • hapax
    • about / Important features of opinions
  • horizontal merge
    • example / Merging datasets horizontally
  • hot deck imputation
    • about / Use a similar value

I

  • 2-itemsets
    • about / Frequent itemset mining basics
  • 3-itemsets
    • about / Frequent itemset mining basics
  • implicit
    • about / The structure of an opinion
  • impute
    • about / Use a central measure
  • in-degree
    • about / Degree of a network
  • InterCaps
    • about / Why look for named entities?
  • interestingness measures for association rules
    • about / Added value – fixing a flaw in the plan
  • isolates
    • about / Components of a network

J

  • JavaScript Serialized Object Notation (JSON)
    • about / JSON
  • JSON link series
    • about / JSON node and link series
  • JSON node series
    • about / JSON node and link series
  • JSON trees
    • about / JSON trees

K

  • knowledge discovery in databases (KDD)
    • about / What is data mining?
  • knowledge discovery process
    • about / What is data mining?

L

  • Last Observation Carried Forward (LOCF)
    • about / Use Last Observation Carried Forward
  • Latent Dirichlet Allocation (LDA)
    • about / Latent Dirichlet Allocation
    • reference link / Latent Dirichlet Allocation
    • download link / Latent Dirichlet Allocation
  • Latent Semantic Analysis (LSA)
    • reference / Sumy's LSA summarizer
  • Levenshtein distance
    • about / Levenshtein distance
  • lexicon
    • URL / Hu and Liu's sentiment analysis lexicon
  • link analysis problems
    • about / What are the techniques used in data mining?
  • links
    • about / What is a network?
  • linusrants
    • about / Data analysis of e-mail messages
    • URL / Data analysis of e-mail messages
  • Linux Kernel Mailing List (LKML)
    • about / Data analysis of e-mail messages
  • LKML e-mails
    • about / LKML e-mails
  • lkmlLinusAll.txt
    • reference link / Gensim LDA for a larger project
  • logic errors
    • about / Logic or semantic errors

M

  • machine learning
    • reference link / What is topic modeling?
    • outliers, detecting with / Detecting outliers with machine learning
  • machine learning based entity matching
    • about / Machine learning-based entity matching
  • manually, fixing
    • example / Fix the problem manually
  • market basket analysis
    • about / What are frequent itemsets?
    • market / Frequent itemset mining basics
    • basket / Frequent itemset mining basics
    • items / Frequent itemset mining basics
  • Matrix Market (MM) format
    • about / Serializing a corpus
    • reference link / Serializing a corpus
  • maximum normalized residual test
    • about / Detecting outliers with modified z-scores
  • Message Understanding Conference (MUC)
    • about / Handling partial matches
  • minimum support threshold
    • about / Support
  • missing data
    • about / Missing data
    • locating / Locating missing data
    • zero values / Zero values
  • missing data, fixing
    • about / Fixing missing data
    • rows, ignoring / Ignore the problem rows
    • manually, fixing / Fix the problem manually
    • fabricated value used / Use a fabricated value
    • central measure used / Use a central measure
    • Last Observation Carried Forward (LOCF) used / Use Last Observation Carried Forward
    • similar value used / Use a similar value
    • most likely value used / Use the most likely value
  • modified z-score
    • about / Detecting outliers with modified z-scores
  • modified z-scores
    • outliers, detecting with / Detecting outliers with modified z-scores
  • multi-document
    • about / What is automatic text summarization?
  • multiple components
    • about / Components of a network
  • multivariate data sets
    • about / Statistical detection of outliers
  • MySQL
    • URL / How do we set up our data mining work environment?

N

  • named entity recognition (NER)
    • about / Why look for named entities?
    • techniques / Techniques for named entity recognition
    • part of speech (POS), tagging / Tagging parts of speech
  • named entity recognition (NER) project
    • about / Named entity recognition project
    • NER tool / A simple NER tool
  • named entity recognition (NER) systems
    • building / Building and evaluating NER systems
    • evaluating / Building and evaluating NER systems
    • partial matches / NER and partial matches
    • partial matches handling / Handling partial matches
  • named entity recognition (NER) tool
    • about / A simple NER tool
    • Apache board meeting minutes / Apache Board meeting minutes
    • Django IRC chat / Django IRC chat
    • GnuIRC summaries / GnuIRC summaries
    • LKML e-mails / LKML e-mails
  • natural language processing (NLP)
    • about / The basics of sentiment analysis
  • Natural Language Toolkit (NLTK)
    • about / How do we set up our data mining work environment?
  • negation words
    • about / Important features of opinions
  • network
    • about / What is a network?
    • measuring / Measuring a network
  • network, measuring
    • degree / Degree of a network
    • diameter / Diameter of a network
    • graph walk / Walks, paths, and trails in a network
    • graph trail / Walks, paths, and trails in a network
    • path / Walks, paths, and trails in a network
    • components / Components of a network
    • centrality / Closeness centrality
    • degree centrality / Degree centrality
    • betweenness centrality / Betweenness centrality
    • centrality, measures / Other measures of centrality
  • NetworkX
    • installing / Understanding our data as a network
  • NetworkX file formats
    • reference link / Pajek format
  • neutral word
    • about / SentiWordNet
  • NLTK
    • used, for naive text summarization / Naive text summarization using NLTK
  • NLTK documentation page
    • URL / Data analysis of e-mail messages
  • nodes
    • about / What is a network?
  • novelty
    • about / Outliers
  • nullable
    • about / Locating missing data
  • null data values
    • about / Missing data
  • null words
    • about / Sumy's Edmundson summarizer

O

  • objectivity score
    • about / SentiWordNet
  • opinion mining
    • about / What is sentiment analysis?
    • reference / What is sentiment analysis?
  • opinion shifters
    • about / Important features of opinions
  • opinion words
    • about / Important features of opinions
  • out-degree
    • about / Degree of a network
  • outlier
    • about / What are data anomalies?, Outliers
  • outlier detection
    • reference link / Detecting outliers with machine learning
  • outliers
    • visual mining / Visual mining for outliers
    • statistical detection / Statistical detection of outliers
  • outliers, statistical detection
    • outliers, detecting with modified z-scores / Detecting outliers with modified z-scores
    • outliers, detecting by combining statistics / Detecting outliers by combining statistics and visual mining
    • outliers, detecting by combining visual mining / Detecting outliers by combining statistics and visual mining
    • outliers, detecting with machine learning / Detecting outliers with machine learning
  • overfitting
    • about / Sentiment analysis algorithms

P

  • Pajek format
    • about / Pajek format
  • partial matches
    • about / NER and partial matches
    • strict scoring / NER and partial matches
    • lenient scoring / NER and partial matches
    • partial scoring / NER and partial matches
  • part of speech (POS)
    • about / Tagging parts of speech
    • tagging / Tagging parts of speech
    • named entities, classes / Classes of named entities
  • part of speech, abbreviations
    • reference link / Tagging parts of speech
  • parts of speech
    • about / Important features of opinions
  • path
    • about / Walks, paths, and trails in a network
  • pendant nodes
    • about / Playing with the parameters of a network
  • Penn, noun abbreviations
    • example / Tagging parts of speech
  • Penn Treebank tagger
    • about / Tagging parts of speech
  • position of word
    • about / Important features of opinions
  • POS tagger
    • about / Tagging parts of speech
  • precision
    • about / Effectiveness – how accurate are the matches that we generate?
  • profile
    • about / Leveraging disjoint sets
  • Python pickle
    • about / Python pickle

Q

  • question answering (QA) systems
    • about / Why look for named entities?

R

  • real-world project, network
    • about / A real project
    • data, exploring / Exploring the data
    • network files, generating / Generating the network files
    • data, social network / Understanding our data as a network
  • recall
    • about / Effectiveness – how accurate are the matches that we generate?
  • regression problems
    • about / What are the techniques used in data mining?
  • relational database management systems (RDBMS)
    • about / Locating missing data
  • results, entity matching project
    • about / The results, How many entity matches did we find?
    • entity matches / How many entity matches did we find?
    • pairs, identifying / How good are the pairs we found?
  • rf_developer_projects table
    • datasource_id / Exploring the data
    • dev_loginname / Exploring the data
    • proj_unixname / Exploring the data
  • rigid designator
    • about / Techniques for named entity recognition
  • Rmagick on RubyForge
    • about / Two examples
    • references / Two examples
  • Rmagick on RubyGems
    • about / Two examples
    • references / Two examples
  • RubyForge
    • URL / Matching on URLs
  • Ruby on Rails
    • URL / How many entity matches did we find?

S

  • Scikit-learn tutorial
    • URL / How do we set up our data mining work environment?
  • semantic errors
    • about / Logic or semantic errors
    • example / Logic or semantic errors
  • sentiment analysis
    • about / What is sentiment analysis?
    • reference / What is sentiment analysis?
    • algorithms / Sentiment analysis algorithms
    • general-purpose data collections / General-purpose data collections
  • sentiment analysis, basics
    • about / The basics of sentiment analysis
    • opinion, structure / The structure of an opinion
    • document-level analysis / Document-level and sentence-level analysis
    • sentence-level analysis / Document-level and sentence-level analysis
    • opinions, features / Important features of opinions
  • sentiment intensity
    • about / Vader sentiment
  • sentiment mining application
    • about / Sentiment mining application
    • project, motivating / Motivating the project
    • data preparation / Data preparation
    • chat messages, data analysis / Data analysis of chat messages
    • e-mail messages, data analysis / Data analysis of e-mail messages
  • sentiment score
    • URL / Data analysis of chat messages
  • sentiment words
    • about / Important features of opinions
  • SentiWordNet
    • URL / SentiWordNet
  • sequence analysis problems
    • about / What are the techniques used in data mining?
  • set notation
    • about / Frequent itemset mining basics
  • significant words
    • about / What is automatic text summarization?
  • simpleTextSummaryNLTK.py
    • reference / Naive text summarization using NLTK
  • single-document
    • about / What is automatic text summarization?
  • Six Steps process
    • problem statement / The Six Steps process
    • data collection / The Six Steps process
    • data storage / The Six Steps process
    • data cleaning / The Six Steps process
    • data mining / The Six Steps process
    • representation / The Six Steps process
    • visualization / The Six Steps process
    • problem resolution / The Six Steps process
  • software project tags
    • association rules, discovering / A project – discovering association rules in software project tags
  • Soundex
    • about / Soundex
  • source lines of code (SLOC)
    • about / Outliers
  • specificity
    • about / Effectiveness – how accurate are the matches that we generate?
  • stigma words
    • about / Sumy's Edmundson summarizer
  • stopwords
    • about / Naive text summarization using NLTK
  • string edit distance
    • about / String edit distance
  • subgraphs
    • reference link / Analyzing subgraphs
  • subjectivity classification
    • about / Document-level and sentence-level analysis
  • summarization problems
    • about / What are the techniques used in data mining?
  • SUMMRY
    • about / Tools for text summarization
    • reference / Tools for text summarization
  • Sumy
    • used, for text summarization / Text summarization using Sumy
    • references / Text summarization using Sumy
  • Sumy's Edmundson summarizer
    • reference / Sumy's Edmundson summarizer
    • about / Sumy's Edmundson summarizer
  • Sumy's LSA summarizer
    • about / Sumy's LSA summarizer
  • Sumy's Luhn summarizer
    • about / Sumy's Luhn summarizer
  • Sumy's TextRank summarizer
    • about / Sumy's TextRank summarizer
  • sustainable
    • about / Usefulness – how practical is the matching procedure to use?

T

  • target
    • about / The structure of an opinion
  • target data
    • about / The Fayyad et al. KDD process
  • terms
    • about / Important features of opinions
  • text samples
    • download link / Gensim for topic modeling
  • text summarization
    • tools / Tools for text summarization
    • naive text summarization, NLTK used / Naive text summarization using NLTK
    • using Gensim / Text summarization using Gensim
    • Sumy used / Text summarization using Sumy
  • text summarization, methods
    • Sumy's Luhn summarizer / Sumy's Luhn summarizer
    • Sumy's TextRank summarizer / Sumy's TextRank summarizer
    • Sumy's LSA summarizer / Sumy's LSA summarizer
    • Sumy's Edmundson summarizer / Sumy's Edmundson summarizer
  • topic modeling
    • about / What is topic modeling?
    • Gensim used / Gensim for topic modeling
    • Gensim LDA topics / Understanding Gensim LDA topics
    • Gensim LDA passes / Understanding Gensim LDA passes
    • Gensim LDA model, applying to documents / Applying a Gensim LDA model to new documents
    • Gensim LDA objects, serializing / Serializing Gensim LDA objects
  • training examples
    • about / Sentiment analysis algorithms
  • tree structure
    • about / JSON trees
  • tripletons
    • about / Frequent itemset mining basics
  • true positives (TP)
    • about / How good are the pairs we found?
  • type errors
    • about / Data type and character set errors

U

  • Ubuntu
    • URL / Data preparation
  • undirected network
    • about / What is a network?
  • univariate data sets
    • about / Statistical detection of outliers
  • unsupervised
    • about / What is topic modeling?
  • upward closure property
    • about / Methods for finding frequent itemsets

V

  • Vader sentiment
    • URL / Vader sentiment
    • URL, for specific lexicon / Vader sentiment
  • Vapor on RubyForge
    • about / Two examples
    • references / Two examples
  • Vapor on RubyGems
    • about / Two examples
    • references / Two examples
  • vertical merge
    • example / Merging datasets vertically
  • vertices
    • about / What is a network?
  • visual mining
    • about / Visual mining for outliers

W

  • weighted network
    • about / What is a network?

Z

  • z-score
    • about / Detecting outliers with modified z-scores
    • reference link / Detecting outliers with modified z-scores
lock icon The rest of the chapter is locked
arrow left Previous Chapter
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}