The Big R-Book

The Big R-Book : From Data Science to Learning Machines and Big Data

Free delivery worldwide

Available. Dispatched from the UK in 4 business days


When will my order arrive?

Available. Expected delivery to the United States in 9-12 business days.


Not ordering to the United States? Click here.
Expected to be delivered to the United States by Christmas Expected to be delivered to the United States by Christmas

Description

Introduces professionals and scientists to statistics and machine learning using the programming language R


Written by and for practitioners, this book provides an overall introduction to R, focusing on tools and methods commonly used in data science, and placing emphasis on practice and business use. It covers a wide range of topics in a single volume, including big data, databases, statistical machine learning, data wrangling, data visualization, and the reporting of results. The topics covered are all important for someone with a science/math background that is looking to quickly learn several practical technologies to enter or transition to the growing field of data science.


The Big R-Book for Professionals: From Data Science to Learning Machines and Reporting with R includes nine parts, starting with an introduction to the subject and followed by an overview of R and elements of statistics. The third part revolves around data, while the fourth focuses on data wrangling. Part 5 teaches readers about exploring data. In Part 6 we learn to build models, Part 7 introduces the reader to the reality in companies, Part 8 covers reports and interactive applications and finally Part 9 introduces the reader to big data and performance computing. It also includes some helpful appendices.





Provides a practical guide for non-experts with a focus on business users

Contains a unique combination of topics including an introduction to R, machine learning, mathematical models, data wrangling, and reporting

Uses a practical tone and integrates multiple topics in a coherent framework

Demystifies the hype around machine learning and AI by enabling readers to understand the provided models and program them in R

Shows readers how to visualize results in static and interactive reports

Supplementary materials includes PDF slides based on the book's content, as well as all the extracted R-code and is available to everyone on a Wiley Book Companion Site



The Big R-Book is an excellent guide for science technology, engineering, or mathematics students who wish to make a successful transition from the academic world to the professional. It will also appeal to all young data scientists, quantitative analysts, and analytics professionals, as well as those who make mathematical models.
show more

Product details

  • Hardback | 928 pages
  • 217 x 282 x 40mm | 2,080g
  • Wiley-Blackwell
  • Hoboken, United States
  • English
  • 1. Auflage
  • 1119632722
  • 9781119632726

Back cover copy

Introduces professionals and scientists to statistics, machine learning, and big data using the programming language R

Written by and for practitioners, this book provides an overall introduction to R, focusing on tools and methods commonly used in data science, and placing emphasis on practice and business use. It covers a wide range of topics in a single volume, including big data, databases, statistical machine learning, data wrangling, data visualization, and the reporting of results. The topics covered are all important for someone with a science/math background that is looking to quickly learn several practical technologies to enter or transition to the growing field of data science.

The Big R-Book: From Data Science to Learning Machines and Big Data includes nine parts, starting with an introduction to the subject and followed by an overview of R and elements of statistics. The third part revolves around data, while the fourth focuses on data wrangling and exploring data. In Part 5 we learn to build models, Part 6 introduces the reader to the reality in companies, Part 7 covers reports and interactive applications and Part 8 introduces the reader to big data and performance computing. The appendices focus on specialist topics such as building your own extention for R, answer questions that appear througout the book, etc. Provides a practical guide for non-experts with a focus on business users Contains a unique combination of topics including an introduction to R, machine learning, multi criteria decision analysis, mathematical models, data wrangling, and reporting Uses a practical tone and integrates multiple topics in a coherent framework Demystifies the hype around machine learning and AI by enabling readers to understand the models and program them in R Shows readers how to visualize results in reports and dynamic websites Supplementary materials include PDF slides based on the book's content on an Wiley Instructor-only Book Companion Site, as well as all the extracted R-code available to everyone on a Wiley Student Book Companion Site

The Big R-Book is an excellent guide for science technology, engineering, or mathematics students and graduates who wish to make a successful transition from the academic world to the professional. It will also appeal to all young data scientists, quantitative analysts, and analytics professionals, as well as those who make mathematical models or review them.
show more

Table of contents

Foreword v


About the Author vii


Acknowledgements ix


Preface / Why this book? xi


Contents xv


I Introduction 1


1 The Big Picture with Kondratiev and Kardashev 3


2 The Scientific Method and Data 7


3 Conventions 13


II Starting with R and Elements of Statistics 19


4 The Basics of R 21


4.1 Variables 27


4.2 Data Types 29


4.2.1 Elementary Data Types 29


4.2.2 Vectors 30


4.2.3 Lists 33


4.2.4 Matrices 39


4.2.5 Arrays 42


4.2.6 Factors 44


4.2.7 Data Frames 48


4.3 Operators 56


4.3.1 Arithmetic Operators 56


4.3.2 Relational Operators 57


4.3.3 Logical Operators 57


4.3.4 Assignment Operators 59


4.3.5 Other Operators 60


4.3.6 Loops 62


4.3.7 Functions 66


4.3.8 Packages 70


4.3.9 Strings 73


4.4 Selected Data Interfaces 76


4.4.1 CSV Files 76


4.4.2 Excel Files 80


4.4.3 Databases 80


4.5 Distributions 83


4.5.1 Normal Distribution 83


4.5.2 Binomial Distribution 85


5 Lexical Scoping and environments 91


5.1 Environments in R 92


5.2 Lexical Scoping in R 94


6 The Implementation of OO 99


6.1 Base Types 102


6.2 S3 Objects 104


6.2.1 Creating S3 objects 107


6.2.2 Creating generic methods 109


6.2.3 Method dispatch 110


6.2.4 Group generic functions 111


6.3 S4 Objects 114


6.3.1 Creating S4 Objects 114


6.3.2 Recognising objects, generic functions, and methods 122


6.3.3 Creating S4 Generics 124


6.3.4 Method dispatch 125


6.4 The reference class, refclass, RC or R5 model 127


6.4.1 Creating R5 objects 127


6.5 OO Conclusion 134


7 Tidy R with the Tidyverse 137


7.1 The Philosophy of the Tidyverse 138


7.2 Packages in the tidyverse 141


7.3 Working with the tidyverse 144


7.3.1 tibbles 144


7.3.2 Piping with R 150


7.3.3 Attention points when using the pipe command 151


7.3.3.1 Advanced piping 153


7.3.3.2 Conclusion 155


8 Elements of Descriptive Statistics 157


8.1 Measures of Central Tendency 158


8.1.1 Mean 158


8.1.2 The Median 161


8.1.3 The Mode 162


8.2 Measures of Variation or Spread 164


8.3 Measures of Covariation 166


8.4 Chi Square Tests 169


9 Further Reading 171


III Data Import 173


10 A short history of modern database systems 175


11 RDBMS 179


12 SQL 183


12.1 Designing the database 184


12.2 Building the database 187


12.3 Adding data to the database 196


12.4 Querying the database 200


12.5 Modifying an existing database 206


12.6 Advanced features of SQL 211


13 Connecting R to an SQL database 215


IV Data Wrangling 221


14 Anonymising Data 225


15 DataWrangling in the tidyverse 229


15.1 Tidy data 230


15.2 Importing the data 232


15.2.1 Importing from an SQL RDBMS 232


15.2.2 Importing flat files in the tidyverse 234


15.2.2.1 CSV Files 236


15.2.2.2 Making sense of fixed width files 238


15.3 Tidying up data with tidyr 243


15.3.1 Splitting tables 244


15.3.2 headers to data 249


15.3.3 Spreading one column over many 250


15.3.4 separate 252


15.3.5 Unite 254


15.3.6 Wrong Data 255


15.4 Playing with tipples: SQL-like functionality 256


15.4.1 Selecting 256


15.4.2 Filtering 256


15.4.3 Joining 258


15.4.4 Mutating 262


15.4.5 Set Operations 265


15.5 String Manipulation in the tidyverse 268


15.5.1 Basic string manipulation 269


15.5.2 Pattern matching with regular expressions 272


15.5.2.1 Regular Expressions 273


15.5.2.2 Functions using Regex 279


15.6 Dates with lubridate 287


15.6.0.1 ISO 8601 Format 288


15.6.0.2 Timezones 290


15.6.0.3 Extract and set date and time components 291


15.6.0.4 Calculating with date-times 293


15.7 Factors with forcats 298


16 Dealing with missing data 307


17 Data Binning 319


17.1 Tuning the binning procedure 323


17.2 More complex cases: matrix binning 329


17.3 Weight of evidence and information value 336


18 Factoring analysis and principle components 339


18.1 Principle components analysis 340


18.2 Factor Analysis 345


V Explore Data 349


19 Using Descriptive Statistics 353


20 Standard Charts & Graphs 357


20.1 Pie Charts 358


20.2 Bar Charts 359


20.3 Boxplots 361


20.4 Violin plots 363


20.5 Histograms 366


20.6 Scatterplots 368


20.7 Line Graphs 371


20.8 Plotting Functions 373


20.9 Maps and contour plots 374


21 Elected Visualization Methods 377


21.1 Heat-maps 377


21.2 Text Mining 379


21.2.1 Word Clouds 379


21.2.2 Word Associations 383


21.3 Colours in R 386


22 Time Series Analysis 393


22.1 Time Series in R 394


22.2 Forecasting 397


22.2.1 Moving Average 397


22.2.2 Seasonal Decomposition 403


VI Modelling 409


23 Regression Models 411


23.1 Linear Regression 411


23.2 Multiple Linear Regression 415


23.2.1 Poisson Regression 416


23.2.2 Non-Linear Regression 418


23.3 Performance of regression models 421


23.3.1 Mean Square Error (MSE) 421


23.3.2 R-Squared 421


23.3.3 Mean Average Deviation (MAD) 423


24 Classification Models 425


24.1 Logistic Regression 425


24.2 The performance of binary classification models 427


24.2.1 The Confusion Matrix and related measures 428


24.2.2 ROC 431


24.2.3 AUC 433


24.2.4 AUC Gini for logistic regression 435


24.2.5 Kolmogorov-Smirnov (KS) for logistic regression 436


24.2.6 Finding an Optimal Cut-off 439


25 Learning Machines 445


25.1 Decision Tree 447


25.1.1 Essential Background 447


25.1.2 Important considerations 452


25.1.3 Growing trees with R 455


25.1.4 Evaluating the performance of a decision tree 463


25.1.4.1 The performance of the regression tree 464


25.1.4.2 The performance of the classification tree 464


25.2 Random Forest 467


25.3 Artificial Neural Networks (ANN) 472


25.3.1 The basics of ANNs in R 472


25.3.2 An example of a work-flow to develop an ANN 475


25.4 Support Vector Machine 483


25.5 Unsupervised learning and clustering 487


25.5.1 k-means clustering 488


25.5.2 Fuzzy clustering 501


25.5.3 Hierarchical clustering 504


25.5.4 Other clustering methods 506


26 Towards a tidy modelling cycle with modelr 507


27 Model Validation 513


27.1 Model quality measures 515


27.2 Predictions and residuals 516


27.3 Bootstrapping 517


27.4 Cross-Validation 520


27.4.1 training and validating 521


27.5 Monte-Carlo Cross Validation 525


27.6 k-Fold Cross Validation 527


27.7 Comparison 529


27.8 Validation in a broader perspective 530


28 Labs 535


28.1 Financial Analysis with QuantMod 535


28.1.1 The quantmod data structure 539


28.1.2 Support functions supplied by quantmod 543


28.1.3 Financial modelling in quantmod 545


29 Multi Criteria Decision Analysis (MCDA) 553


29.1 What and Why 553


29.2 GeneralWork-flow 555


29.3 Identify the issue at hand: step 1 and 2 559


29.4 STEP 3: the decision matrix 561


29.4.1 Construct a decision matrix 561


29.4.2 Normalize the decision matrix 563


29.5 STEP 4: leave out inefficient and unacceptable alternatives 565


29.5.1 Unacceptable Alternatives 565


29.5.2 Dominance- inefficient alternatives 565


29.6 Printing preference relationships 568


29.7 STEP 6: MCDA Methods 570


29.7.1 Examples of non-compensatory methods 570


29.7.2 The weighted sum method (WSM) 571


29.7.3 WPM 574


29.7.4 ELECTRE 575


29.7.4.1 ELECTRE I 576


29.7.4.2 ELECTRE II 582


29.7.5 PROMethEE 584


29.7.5.1 PROMethEE I 587


29.7.5.2 PROMethEE II 597


29.7.6 PCA (Gaia) 602


29.7.7 Outranking methods 607


29.7.8 Goal Programming 608


29.8 Summary MCDA 611


VII Introduction to Companies 613


30 Financial Accounting 617


30.1 The Statements of Accounts 618


30.1.1 Income Statement 618


30.1.2 Net Income: The P&L statement 618


30.1.3 Balance Sheet 619


30.2 The Value Chain 621


30.3 Further Terminology 623


30.4 Selected Financial Ratios 625


31 Management Accounting 627


31.1 Introduction 628


31.2 Selected Methods in MA 630


31.2.1 Cost Accounting 630


31.2.2 Selected Cost Types 632


31.3 Selected Use Cases of MA 635


31.3.1 Balanced Scorecard 635


31.3.2 Key Performance Indicators 636


31.3.2.1 Selection of KPIs 638


32 Asset Valuation Basics 641


32.1 Time Value of Money 642


32.2 Cash 645


32.3 Bonds 646


32.3.1 Valuation of Bonds 648


32.3.2 Duration 650


32.3.2.1 Macaulay Duration 651


32.3.2.2 Modified Duration 652


32.4 Equities 654


32.4.1 Valuation of Equities 655


32.4.1.1 CAPM 656


32.4.2 Absolute Value Models 660


32.4.2.1 Dividend Discount Model 660


32.4.2.2 Free Cash Flow (FCF) 664


32.4.2.3 Discounted Cash Flow Model 666


32.4.2.4 Discounted Abnormal Operating Earnings valuation model 668


32.4.2.5 Net Asset Value Method or Cost Method 668


32.4.2.6 Excess Earnings Method 670


32.4.3 Relative Value Models 670


32.4.3.1 The Idea behind Relative Value Models 670


32.4.3.2 Some Ratios that can be used in relative value models 671


32.4.3.3 Measures Related to Company Value for External Stakeholders 673


32.4.3.4 Relative Value Models in Practice 680


32.4.3.5 Conclusions and Use 680


32.4.4 Selection of Valuation Methods 681


32.4.5 Pitfalls and Matters Requiring Attention for all Methods 682


32.4.5.1 Results and Sensitivity 682


32.5 Forwards and Futures 690


32.6 Options 692


32.6.1 Definitions 692


32.6.2 Commercial Aspects 695


32.6.3 Historic observations 696


32.6.4 Valuation of Options at Maturity 697


32.6.5 The Put-Call Parity 700


32.6.6 The Black & Scholes Model 702


32.6.6.1 Apply the Black and Scholes formula 703


32.6.7 Dependencies 705


32.6.8 Sensitivities: "the Greeks" 710


32.6.9 Delta Hedging 711


32.6.10 Linear Option Strategies 714


32.6.10.1 The Limits of the Black and Scholes Model 720


32.6.11 The Binomial Model 724


32.6.11.1 Risk Neutral Method 727


32.6.11.2 The Equivalent Portfolio Binomial Model 729


32.6.11.3 Summary Binomial Model 732


32.6.12 Exotic Options 732


32.6.13 Integrated Option Strategies 733


32.6.14 Capital Protected Structures 736


VIII Report 739


33 ggplot2 743


34 R-markdown 753


35 knitr and LATEX 757


36 An automated development cycle 761


37 Writing and communication skills 763


38 Interactive apps 767


38.1 Shiny 769


38.2 Browser born data visualization 773


38.2.1 HTML-widgets 773


38.2.2 ggvis 775


38.2.3 googleVis 777


38.3 Dashboards 779


38.3.1 The business case: a diversity dashboard 780


38.3.2 A dashboard with flexdashboard 785


38.3.2.1 Interactive dashboards with flexdashboard 790


38.3.3 A dashboard with shinydashboard 791


IX Appendices 795


39 Other Resources 797


40 Levels of Measurement 799


40.1 Nominal Scale 800


40.2 Ordinal Scale 801


40.3 Interval Scale 802


40.4 Ratio Scale 803


41 Trademark Notices 805


42 Code snippets not shown in the body of the book 809


43 Answers to questions 815


Bibliography 829


Index 839


Nomenclature 851
show more

About Philippe J. S. De Brouwer

PHILIPPE J.S. DE BROUWER, PHD, is director at HSBC, guest professor at four universities and MBA programs (University of Warsaw, Jagiellonian University, Krakow School of Business and AGH University of Science and Technology) and honorary consul for Belgium in Krakow. As a professor, he builds bridges not only between universities and the industry, but also across disciplines. He teaches mathematicians leadership skills and non-mathematicians coding. As a scientist, he tries to combine research on financial markets, psychology, and investments to the benefit of the investor. As an honorary consul he is passionate about serving the community and helping initiatives grow.
show more