Recommended Reading – Data / Analytics

Last Updated: 2020-02-09 – Keep in mind, this is a work in progress.

Below is a list I’ve compiled from various “Recommended Reading” and tutorial/advice docs and emails I’ve sent out to friends and colleagues in the past few years to help them learn data skills they needed at the time to grow their careers, answer specific questions for managers/etc, or even just explore the possibilities of their content and work.

No matter what stage of this career path you’re in, remember: We use data to “model” an experience. How can you make the data more accurately show what a user is doing?

General / Intro to Data

  • Data Science (The MIT Press Essential Knowledge series) By John D. Kelleher and Brendan Tierney
    • This book is a short and sweet intro into a lot of different topics in “data” / “data science.” I highly recommend starting with this book because learning a little more about the wide variety of data topics helps give you a better idea of how the pieces all fit together. It can also help you learn more to then guide you toward what aspects of data you want to then focus on. Also, it has really great info on the different steps companies as a whole need to in order to reach different levels of data work. For example, you can’t start training ML models if your operational data you’re using as a source is messy. 
  • Inspiration:

Security!!!

I’m including this as one of the first sections because understanding and acting with good security best practices is one of the most important things you’ll do as a data person.

  • How Apple and Amazon Security Flaws Led to My Epic Hacking by Mat Honan
    • This is extremely important to read because if you want to work in data, you need to remember that you are the arbiter of people’s information. You must respect it and take care of it. And it starts with protecting your own accounts.
  • Make sure you use a password manager like 1Password or LastPass
    • Make sure your Master Password is very strong
    • Make sure you have 2FA on it
  • Setup 2FA on everything
  • Don’t post personal information (addresses, birthdates, phone numbers, license plates, IDs, etc) online/social media. Hackers can piece all of this together to then social engineer into your accounts.
    • Related: Don’t do bday posts for others. Don’t post others’ personal information. Don’t share others’ personal information.
  • Don’t be creepy or unethical! Just use the data skills you have for improving user experiences and helping make the world a better place. Don’t be tempted to do evil things with it.

Tools and Skills

General Data Skills

Excel

I HIGHLY recommend getting really well acquainted with Excel / GoogleSheets when you start your data journey. This is because many times, you’ll still need to come back to a spreadsheet and see what others have done with the data you pull for them.

Also, probably more importantly, working with rows, columns, and especially filters trains you into imagining data as tables. Knowing how the data will eventually look in a database and especially how granularity works and what data needs to be included with each row can help guide you in the ways you model data for a database, transformation, etc.

SQL

If you want to be a data analyst/engineer/etc, SQL will be the tool you’ll use to dig into data a lot initially – particularly to see exactly what tables, rows, etc the company has before you start running any Python scripts.

Think of SQL as the querying language that is basically like ultimate Excel – it lets you filter in a billion different ways, calculate by aggregating all the rows and cells you define are needed, etc.

  • Codecademy’s SQL tutorials
    • I highly recommend learning through the CodeCademy SQL tutorials first because it allows you to practice the syntax in the safe environment of Codecademy’s UI. You won’t be blocked by the additional steps of setting up a database and choosing the engine etc etc on your computer. You can just get right to learning the syntax and practicing querying. Don’t let all the other steps block you at the start. They’re important skills / configurations to learn but right now, focus on the syntax and being able to write queries to unlock/unblock the analysis skills! Besides, if you want to go the analyst route, many times the company you work for will have a cloud-based SQL editor like Hue or you can eventually learn how to use SequelPro, MySQL Workbench, DBeaver, etc etc to connect to the database(s).
  • Sams Teach Yourself SQL in 10 Minutes by Ben Forta
    • Highly recommend this too because this is the book I used to learn SQL super quickly at my second job. Ben Forta is great at explaining things succinctly.

Python

  • !!! Python Crash Course 2nd Ed: https://nostarch.com/pythoncrashcourse2e !!!
    • I LOVE this book! I’ve gone through tons of tutorials, books, etc, and this book was what finally helped me understand a bunch of concepts that I kept getting stuck on. It’s much more verbose than others, but for me, when I was first starting out, I really needed those more explicit explanations.
  • Learn Python the Hard Way: https://learncodethehardway.org/python/
    • This resource was one of the very first ones I was recommended when I first started learning how to code. It was interesting to work through, but I, admittedly, felt like I really needed some more hand holding and explanations to feel comfortable with Python.
  • Data Science from Scratch by Joel Grus
    • This book guides you into data science skills while also teaching you enough Python to get started.

Regex

Regex is super useful for string matching logic in SQL queries, Python scripts, and even in just trying to `grep` around in your files to find a specific line of code. I first learned some super basic regex in order to do some string matching logic for slightly complex Google Analytics conversion goals I was setting up.

R

Many times, people choose to either learn Python or R. I focused on Python because I wanted to use it in more applications than just data focused ones. However, R and its Shiny dashboard are really cool tools to be able to use. Also, it really depends on what your employer also already uses.

Scala

  • Scala’s own docs are very useful for getting a quick start on the syntax and also learning how to use SBT: https://docs.scala-lang.org/
  • Programming in Scala, Fourth Edition by Martin Odersky, Lex Spoon, and Bill Venners
    • I tend to prefer more detailed tutorials and books, so I highly recommend this book.
    • Note: They’re going to release the 5th edition soon, so be on the lookout.
  • Functional Programming in Scala by Paul Chiusano and Runar Bjarnason
    • I’ve started studying with this book because my current job not only uses Scala but specifically uses the functional programming paradigm. I’ve read in reviews and threads that warned that this one is a little more tough for beginners. I’ll update this guide after I’ve worked through more of the book.

Spark

  • Learning Spark, 2nd Edition by by Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee
    • This is the first book on Spark that I started working through. Coming from Python, I found it a concise intro to Spark and really liked being able to get glimpses into the syntax for Java and Scala.
  • Spark: The Definitive Guide by Bill Chambers, Matei Zaharia
    • This is THE definitive guide. It’s very thorough, which I always prefer because lately I realized I go into a language or a framework completely green and really have been appreciating building my foundational knowledge now after spending years hacking a lot of “this is good enough” together quickly due to deadlines.

Tips

  • “I know concept X in a different language, how do I do it in this language?” + StackOverflow

Data Ops / DevOps-Related Skills useful for Data People

General

AWS

Tips

  • Learn what tools, services (especially alerting!), etc that the data infrastructure uses! This can help you troubleshoot in the future. And it can also help you become more self-sufficient if you learn how to manage parts or all of these, especially provisioning more servers or databases.

Data Modeling / Data Warehouses

Misc

Applied Data

  • The Victory Lab – Data applied to elections! This is the book that got me started in really working with data professionally.
  • Dataclysm by Christian Rudder

Data Posts with a Startup/Digital Marketing Focus

Note: These are old but contain good fundamentals. I recommend making sure you read this kind of content to help anchor your data skills and data thinking to actual example use cases and to also learn the way that startups/tech companies ask about data. Make sure to read them and then search for newer versions of that info.

Startup Metrics

  • Andreessen Horowitz’s “16 Metrics”: http://a16z.com/2015/08/21/16-metrics/
    • A lot of it is more finance-related and also would need you to make projections of how much revenue you think you’ll make from your site’s/company’s/app’s method of revenue (i.e. sponsored videos, promoted posts, etc) . Good to get estimates for when pitching to VCs, especially things like “burn rate” (how quickly you think you’ll spend the money and why/how)
    • What is usually a really immediately relevant part is the “Product and Engagement Metrics” section, which is all about:
      •  “Active Users” (in your case, visitors to the site)
      • “Month-on-Month” growth: growth in numbers of visitors, numbers of pages viewed or app content viewed or app actions done, amount of time spent on the site, etc
      • Churn is a little tougher, but it’s basically of all the new visitors to your site, who come back again after a month versus a new visitor who never comes back to view a post ever again. There should be something about churn and/or retention in your analytics tool of choice’s dashboards
  • Andreessen Horowitz’s “16 More Metrics”: http://a16z.com/2015/09/23/16-more-metrics/
    • This is a new set of metrics from a16z (Andreessen Horowitz), and it builds upon the previous link. Both are pretty interesting reads because it helps you focus on what the VCs need high-level explanations of.
    • “Average Revenue Per User (ARPU)” is interesting because it gets you to think of how many users you have and then how much money advertisers are willing to pay to reach that many users. This number is usually used by sites that already have ads on them or apps with downloadable/purchasable content since you can get an average of how much ad revenue or sales 1 user brings in then extrapolate to the whole audience.
    • Network Effect and Virality – these are important because they are good ways of seeing how your content’s reach grows. This sometimes takes some more in-depth stats work to do, but good to keep in mind and maybe answer in regards to your social reach and also the numbers of people who view each post, share posts to social, engage with social media posts about an article, etc)
    • Net Promoter Score is tougher to calculate, but it’d be interesting to see how willing users are to tell others about your app
    • Location of Active Users is important because it gives you an idea of your overall reach
      • Also can give you slightly more accurate estimates of how much revenue you can bring in when you take into account different prices for ad impressions your site/app serves
      • Can also give you more insight on more efficient ad buying approaches if, for example, you learn that there is a group of more engaged users in a geo you can target more cheaply
    • Sources of Traffic are related to location of active users but the next step of them going to your site after engaging with your content elsewhere.
      • Helps define which social media networks, other blogs, other sites, organizations, etc get you the highest ROI. Good for deciding on where you want to focus your ad spend too

Analytics (for Web / Marketing)

General

A/B Testing

Non-Data Books that Guide How I Work

  • See my Product Management / UX / Startups / etc Recommended Reading [link tbd]
  • Every book ever written by 37Signals / Basecamp
    • They share a much better way to work and build companies. I use their mindset to help me advocate for building in a calmer, more thorough, and overall more efficient manner.
  • Creative Confidence by David Kelley and Tom Kelley
  • Fear by Thich Nhat Hanh
    • Handling your fear is important. Working in data is tough because often times you end up being the bearer of bad news. Don’t be scared to tell the truth and to be the guide toward reality! But also – remember to really celebrate wins when you and your team have them!
  • Venture Deals by Brad Feld