Building Machine Learning Systems with Python
This is a tutorial-driven and practical, but well-grounded book showcasing good Machine Learning practices. There will be an emphasis on using existing technologies instead of showing how to write your own implementations of algorithms. This book is a scenario-based, example-driven tutorial. By the end of the book you will have learnt critical aspects of Machine Learning Python projects and experienced the power of ML-based systems by actually working on them.This book primarily targets Python developers who want to learn about and build Machine Learning into their projects, or who want to provide Machine Learning support to their existing projects, and see them get implemented effectively .Computer science researchers, data scientists, Artificial Intelligence programmers, and statistical programmers would equally gain from this book and would learn about effective implementation through lots of the practical examples discussed.Readers need no prior experience with Machine Learning or statistical processing. Python development experience is assumed.
- Paperback | 290 pages
- 192 x 234 x 16mm | 519.99g
- 02 Aug 2013
- Packt Publishing Limited
- Birmingham, United Kingdom
- black & white illustrations
Table of contents
Preface Chapter 1: Getting Started with Python Machine Learning Chapter 2: Learning How to Classify with Real-world Examples Chapter 3: Clustering Finding Related Posts Chapter 4: Topic Modeling Chapter 5: Classification Detecting Poor Answers Chapter 6: Classification II Sentiment Analysis Chapter 7: Regression Recommendations Chapter 8: Regression Recommendations Improved Chapter 9: Classification III Music Genre Classification Chapter 10: Computer Vision Pattern Recognition Chapter 11: Dimensionality Reduction Chapter 12: Big(ger) Data Appendix: Where to Learn More about Machine Learning Index Preface Up Chapter 1: Getting Started with Python Machine Learning Machine learning and Python the dream team What the book will teach you (and what it will not) What to do when you are stuck Getting started Introduction to NumPy, SciPy, and Matplotlib Installing Python Chewing data efficiently with NumPy and intelligently with SciPy Learning NumPy Indexing Handling non-existing values Comparing runtime behaviors Learning SciPy Our first (tiny) machine learning application Reading in the data Preprocessing and cleaning the data Choosing the right model and learning algorithm Before building our first model Starting with a simple straight line Towards some advanced stuff Stepping back to go forward another look at our data Training and testing Answering our initial question Summary Up Chapter 2: Learning How to Classify with Real-world Examples The Iris dataset The first step is visualization Building our first classification model Evaluation holding out data and cross-validation Building more complex classifiers A more complex dataset and a more complex classifier Learning about the Seeds dataset Features and feature engineering Nearest neighbor classification Binary and multiclass classification Summary Up Chapter 3: Clustering Finding Related Posts Measuring the relatedness of posts How not to do it How to do it Preprocessing similarity measured as similar number of common words Converting raw text into a bag-of-words Counting words Normalizing the word count vectors Removing less important words Stemming Installing and using NLTK Extending the vectorizer with NLTK's stemmer Stop words on steroids Our achievements and goals Clustering KMeans Getting test data to evaluate our ideas on Clustering posts Solving our initial challenge Another look at noise Tweaking the parameters Summary Up Chapter 4: Topic Modeling Latent Dirichlet allocation (LDA) Building a topic model Comparing similarity in topic space Modeling the whole of Wikipedia Choosing the number of topics Summary Up Chapter 5: Classification Detecting Poor Answers Sketching our roadmap Learning to classify classy answers Tuning the instance Tuning the classifier Fetching the data Slimming the data down to chewable chunks Preselection and processing of attributes Defining what is a good answer Creating our first classifier Starting with the k-nearest neighbor (kNN) algorithm Engineering the features Training the classifier Measuring the classifier's performance Designing more features Deciding how to improve Bias-variance and its trade-off Fixing high bias Fixing high variance High bias or low bias Using logistic regression A bit of math with a small example Applying logistic regression to our postclassification problem Looking behind accuracy precision and recall Slimming the classifier Ship it! Summary Up Chapter 6: Classification II Sentiment Analysis Sketching our roadmap Fetching the Twitter data Introducing the Naive Bayes classifier Getting to know the Bayes theorem Being naive Using Naive Bayes to classify Accounting for unseen words and other oddities Accounting for arithmetic underflows Creating our first classifier and tuning it Solving an easy problem first Using all the classes Tuning the classifier's parameters Cleaning tweets Taking the word types into account Determining the word types Successfully cheating using SentiWordNet Our first estimator Putting everything together Summary Up Chapter 7: Regression Recommendations Predicting house prices with regression Multidimensional regression Cross-validation for regression Penalized regression L1 and L2 penalties Using Lasso or Elastic nets in scikit-learn P greater than N scenarios An example based on text Setting hyperparameters in a smart way Rating prediction and recommendations Summary Up Chapter 8: Regression Recommendations Improved Improved recommendations Using the binary matrix of recommendations Looking at the movie neighbors Combining multiple methods Basket analysis Obtaining useful predictions Analyzing supermarket shopping baskets Association rule mining More advanced basket analysis Summary Up Chapter 9: Classification III Music Genre Classification Sketching our roadmap Fetching the music data Converting into a wave format Looking at music Decomposing music into sine wave components Using FFT to build our first classifier Increasing experimentation agility Training the classifier Using the confusion matrix to measure accuracy in multiclass problems An alternate way to measure classifier performance using receiver operator characteristic (ROC) Improving classification performance with Mel Frequency Cepstral Coefficients Summary Up Chapter 10: Computer Vision Pattern Recognition Introducing image processing Loading and displaying images Basic image processing Thresholding Gaussian blurring Filtering for different effects Adding salt and pepper noise Putting the center in focus Pattern recognition Computing features from images Writing your own features Classifying a harder dataset Local feature representations Summary Up Chapter 11: Dimensionality Reduction Sketching our roadmap Selecting features Detecting redundant features using filters Correlation Mutual information Asking the model about the features using wrappers Other feature selection methods Feature extraction About principal component analysis (PCA) Sketching PCA Applying PCA Limitations of PCA and how LDA can help Multidimensional scaling (MDS) Summary Up Chapter 12: Big(ger) Data Learning about big data Using jug to break up your pipeline into tasks About tasks Reusing partial results Looking under the hood Using jug for data analysis Using Amazon Web Services (AWS) Creating your first machines Installing Python packages on Amazon Linux Running jug on our cloud machine Automating the generation of clusters with starcluster Summary Up Appendix: Where to Learn More about Machine Learning Online courses Books Q&A sites Blogs Data sources Getting competitive What was left out Summary
About Willi Richert
Willi Richert has a PhD in Machine Learning/Robotics and currently works for Microsoft in the Bing Core Relevance Team. He performs statistical machine translation. Luis Pedro Coelho has over 10 years of experience in Machine Learning. He has a PhD from the School of Computer Science at Carnegie Mellon University, which is a very strong school in Machine Learning, and currently works in Computational Biology.