Author Rafael A Irizarry
If you don’t have any experience with R, this is an excellent way to start. Rafael’s explanation of R is friendly and provocative to keep the lector engaged. Certainly, it covers all the core topics and skills that a data scientist must have.
“This book is meant to be a textbook for a first course in Data Science. No previous knowledge of R is necessary, although some experience with programming may be helpful. The statistical concepts used to answer the case study questions are only briefly introduced, so a Probability and Statistics textbook is highly recommended for in-depth understanding of these concepts. If you read and understand all the chapters and complete all the exercises, you will be well-positioned to perform basic data analysis tasks and you will be prepared to learn the more advanced concepts and skills needed to become an expert.”
Lectures |
---|
Part 1: Basics of R and the tidyverse |
Learn R throughout the book |
Building blocks needed to keep learning |
Part 2: Data visualization with ggplot2 |
Use ggplot2 to generate graphs |
Describe important data visualization principles |
Part 3: Statistics with R |
Answer case study questions using probability, inference, and regression |
Demonstrate the importance of statistics in data analysis |
Part 4: Data wrangling with tidyverse |
Familiarize the reader with data wrangling |
Specific skills include web scraping, using regular expressions, and joining and reshaping data tables |
Part 5: Machine learning with caret |
Introduce machine learning through challenges |
Use the caret package to build prediction algorithms including K-nearest neighbors and random forests |
Part 6: Productivity tools for data science |
Brief introduction to productivity tools used in data science projects |
Tools include RStudio, UNIX/Linux shell, Git and GitHub, and knitr and R Markdown. |