Bad Data Handbook: Cleaning Up the Data So You Can Get Back to Work

Bad Data Handbook: Cleaning Up the Data So You Can Get Back to Work

Paperback

By (author) Q. Ethan McCallum

$33.38
List price $40.60
You save $7.22 17% off

Free delivery worldwide
Available
Dispatched in 4 business days
When will my order arrive?

  • Publisher: O'Reilly Media, Inc, USA
  • Format: Paperback | 264 pages
  • Dimensions: 176mm x 232mm x 18mm | 440g
  • Publication date: 1 December 2012
  • Publication City/Country: Sebastopol
  • ISBN 10: 1449321887
  • ISBN 13: 9781449321888
  • Edition: 1
  • Sales rank: 545,185

Product description

Welcome to data science's dirty secret: real-world data is messy. Data scientists must spend a good deal of time playing software developer, writing code to clean up data before they can actually do anything constructive with it. It's a necessary evil, but you can still make the most of it. This practical book walks you through several real-world examples to demonstrate the theory and practice behind working with and cleaning up dirty data. No one tool solves all of the problems well. Wise data scientists learn many tools and learn where each one shines. To that end, this book takes a polyglot approach: most examples will involve R and Python, but expect the occasional smattering of Groovy and sed/awk fun.

Other books in this category

Showing items 1 to 11 of 11
Categories:

Author information

Q Ethan McCallum is a consultant, writer, and technology enthusiast, though perhaps not in that order. His work has appeared online on The O'Reilly Network and Java.net, and also in print publications such as C/C++ Users Journal, Doctor Dobb's Journal, and Linux Magazine. In his professional roles, he helps companies to make smart decisions about data and technology.