This site includes additional
resources:
http://mdsr-book.github.io/
Introduction to Data Science
Prologue: Why data science?
Data visualization
A grammar for graphics
Data wrangling
Tidy data and iteration
Professional Ethics
Statistics and Modeling
Statistical foundations
Statistical learning and predictive analytics
Unsupervised learning
Simulation
Topics in Data Science
Interactive data graphics
Database querying using SQL
Database administration
Working with spatial data
Text as data
Network science
Epilogue: Towards big data"
Appendices
Packages used in this book
Introduction to R and RStudio
Algorithmic thinking
Reproducible analysis and workflow
Regression modeling
Setting up a database server
Benjamin S. Baumer is an assistant professor in the Statistical &
Data Sciences program at Smith College. He has been a practicing
data scientist since 2004, when he became the first full-time
statistical analyst for the New York Mets. Ben is a co-author of
The Sabermetric Revolution and won the 2016 Contemporary Baseball
Analysis Award from the Society for American Baseball Research.
Daniel T. Kaplan is the DeWitt Wallace professor of mathematics and
computer science at Macalester College. He is the author of several
textbooks on statistical modeling and statistical computing, and
received the 2006 Macalester Excellence in Teaching award.
Nicholas J. Horton is a professor of statistics at Amherst College.
He is a Fellow of the American Statistical Association (ASA),
member of the NRC Committee on Applied and Theoretical Statistics,
recipient of a number of national teaching awards, author of a
series of books on statistical computing, and actively involved in
curricular reform to help students "think with data."
"Modern Data Science with R is one of the first textbooks to
provide a comprehensive introduction to data science for students
at the undergraduate level (it is also suitable for graduate
students and professionals in other fields). The authors follow the
approach taken by Garrett Grolemund and Hadley Wickham in their
book, R for Data Science, and David Robinson in Teach the Tidyverse
to Beginners, which emphasizes the teaching of data visualization
and the tidyverse (using dplyr and chained pipes) before covering
base R, along with using real-world data and modern data science
methods. The textbook includes end of chapter exercises (an
instructor’s solution manual is available), and a series of lab
activities is also under development. The result is an excellent
textbook that provides a solid foundation in data science for
students and professionals alike... Modern Data Science with R is a
breakthrough textbook." ~ ACM SIGACT News "Only about 60 of the
book’s 551 pages address the questions of uncertainty and inference
that constitute the core of the statistics tradition. The remaining
pages attend the other components of working with data—the import,
wrangling, tidying, visualization, and storage—that are often the
more prominent barriers to understanding modern datasets...Modern
Data Science with R is a landmark: the first full textbook in data
science. (It can serve) as the backbone of a semester-long course
targeted at students with little background in statistics or
computing. It is rich with examples and is guided by a strong
narrative voice. What’s more, it presents an organizing framework
that makes a convincing argument that data science is a course
distinct from applied statistics…By using the tidyverse, the
textbook authors are able to seamlessly interweave a conceptual
framework for data science with the corresponding implementation in
R code….Even though this book is heavily dependent on R, readers
come away with a more general natural language with which to talk
and think about data. Indeed, if R were to cease to exist tomorrow,
these readers would still be well-situated to be data scientists.
In a nutshell, that approach is what makes this such a successful
textbook." ~The American Statistician "Baumer, Kaplan, and Horton
have managed to write a book that will serve a huge variety of
educators while being endlessly interesting and useful to students
of a modern era. Modern Data Science in R is a compilation of ideas
from both ends of the data science and statistics spectrum—tools
for setting up databases and working with regular expressions are
intermixed with fundamentals like regression analysis.
Additionally, the authors pull together fantastic examples from the
scientific community as well as the media at large. Their examples
will engage today's students into understanding why data wrangling,
reproducibility, and ethics are a fundamental part of any data
analysis. Good visualization skills (Tukey) and ethical analyses
(Hoff, "How to Lie with Statistics") are not new ideas. However,
they have recently been lost in the drive for more sophisticated
mathematical and computational methods for working with data.
Baumer et al. modernize the need for good visualization and
communication in ways that will resonate with today's
practitioners. Like Wickham's "ggplot2" and "The Elements of
Statistical Learning" by Hastie et al., "Modern Data Science in R"
promises to be a staple on every data analyst's bookshelf.
Accessible to students and a valuable resource for those who have
been in the field for many years, this book promises to be a
treasure you will want to discover." ~ Jo Hardin, Pomona
College
"This book would be an excellent text book for an introductory data
science course. Many academic institutions are now trying to open
data science programs. But, there is not a good text book available
for data science courses." ~ Mahbubul Majumder, U. of Nebraska
Omaha "The book is unique. It is an encyclopedia of Data Science,
and it covers a wide variety of modern topics; another positive
aspect is that it contains lots of examples and code, and the
layout is quite catchy. One can learn (and teach) subjects as
diverse as: How to give talks, administrating databases, how to
model spatial data, and even ethics---all in one book." ~ Miguel de
Carvalho, The University of Edinburgh "It would undoubtedly be
useful to many postgraduate students of applied statistics. The
handbook style will also be of use to statisticians who want to
keep up to date in this area. In particular the book utilizes
functions from many different R packages, and will be helpful for
data analysts to keep their R skills up to date. Although one of
the appendices covers an introduction to R (R Core Team 2017) and
RStudio (RStudio Team 2017), realistically it is expected that the
reader has some experience with R. Existing R users with no
experience of RStudio might find the appendix useful, but RStudio
is not required to work through this book. Overall the book is well
written, well structured and the general writing style is both
objective and entertaining . . . The book is divided into three
major parts, Introduction to Data Science, Statistics and Modeling,
and Topics in Data Science, followed by six appendices . . . In
conclusion, I recommend this book as a course companion to a
master’s level course in data analysis and to statisticians who
want to keep their skills in the field of data science up to date."
~ Tim Downie, Journal of Statistical Software "Modern Data Science
with R is different . . .as it presents an abundance of R
codes, functions and packages clearly with several useful examples.
For people with a statistical background, the book covers
computational topics like simulation and also includes appropriate
computer science topics such as Data Wrangling, Database Querying
using SQL and Text as Data. The book is well-structured and is
presented in an easy-to-understand manner, making it suitable for a
wide range of readers. . . This book is unique because it
incorporates theoretical fundamentals such as statistical learning
and regression modelling with the modern, practical elements of
data science, including setting up databases and debugging . . .
This book is a valuable resource to all those studying and
interested in data science."
~Shuangzhe Liu, University of Canberra
Ask a Question About this Product More... |