What you need to know about data Mining and data analytic
thinking.
DETAIL
- Author: Foster Provost and Tom Fawcett
- Language: English
- Published: 2013
- Page: 409
- Size: 16 MB
- Format: pdf
CONTENTS
Preface
- Introduction: Data-Analytic Thinking
- Business Problems and Data Science Solutions. Fundamental concepts: A set of canonical data mining tasks; The data mining process; Supervised versus unsupervised data mining.
- Introduction to Predictive Modeling: From Correlation to Supervised Segmentation. Fundamental concepts: Identifying informative attributes; Segmenting data by progressive attribute selection. Exemplary techniques: Finding correlations; Attribute/variable selection; Tree induction.
- Fitting a Model to Data. Fundamental concepts: Finding “optimal” model parameters based on data; Choosing the goal for data mining; Objective functions; Loss functions. Exemplary techniques: Linear regression; Logistic regression; Support-vector machines.
- Overfitting and Its Avoidance. Fundamental concepts: Generalization; Fitting and overfitting; Complexity control. Exemplary techniques: Cross-validation; Attribute selection; Tree pruning; Regularization.
- Similarity, Neighbors, and Clusters. Fundamental concepts: Calculating similarity of objects described by data; Using similarity for prediction; Clustering as similarity-based segmentation. Exemplary techniques: Searching for similar entities; Nearest neighbor methods; Clustering methods; Distance metrics for calculating similarity.
- Decision Analytic Thinking I: What Is a Good Model?. Fundamental concepts: Careful consideration of what is desired from data science results; Expected value as a key evaluation framework; Consideration of appropriate comparative baselines. Exemplary techniques: Various evaluation metrics; Estimating costs and benefits; Calculating expected profit; Creating baseline methods for comparison.
- Visualizing Model Performance. Fundamental concepts: Visualization of model performance under various kinds of uncertainty; Further consideration of what is desired from data mining results. Exemplary techniques: Profit curves; Cumulative response curves; Lift curves; ROC curves.
- Evidence and Probabilities. Fundamental concepts: Explicit evidence combination with Bayes’ Rule; Probabilistic reasoning via assumptions of conditional independence. Exemplary techniques: Naive Bayes classification; Evidence lift.
- Representing and Mining Text. Fundamental concepts: The importance of constructing mining-friendly data representations; Representation of text for data mining. Exemplary techniques: Bag of words representation; TFIDF calculation; N-grams; Stemming; Named entity extraction; Topic models.
- Decision Analytic Thinking II: Toward Analytical Engineering. Fundamental concept: Solving business problems with data science starts with analytical engineering: designing an analytical solution, based on the data, tools, and techniques available. Exemplary technique: Expected value as a framework for data science solution design.
- Other Data Science Tasks and Techniques. Fundamental concepts: Our fundamental concepts as the basis of many common data science techniques; The importance of familiarity with the building blocks of data science. Exemplary techniques: Association and co-occurrences; Behavior profiling; Link prediction; Data reduction; Latent information mining; Movie recommendation; Biasvariance decomposition of error; Ensembles of models; Causal reasoning from data.
- Data Science and Business Strategy. Fundamental concepts: Our principles as the basis of success for a data-driven business; Acquiring and sustaining competitive advantage via data science; The importance of careful curation of data science capability.
- Conclusion
A. Proposal Review Guide.
B. Another Sample Proposal.
Glossary
Bibliography
Index
No comments:
Post a Comment