Kaggle Workshops

Workshops, https://www.kaggle.com/, 2019

These are a collection of all the workshops I’ve run at Kaggle, from July 2016 to December 2019.

Practical Model Evaluation (with AutoML)

How do you know which machine learning model  is going to work best for a specific problem? Learning how to evaluate machine learning models is an important part of the data science workflow. You’ll need it for everything from picking your final submissions for a Kaggle competition to choosing which model your team should put into production.

We know how important model evaluation is, so we’ve put together a three-day workshop to walk you through the model evaluation process from start to finish. We’ll go beyond just optimization metrics, though, and talk about factors for model selection relevant to working data scientists.

Utility Script Competition

As part of the Utility Script Competition we ran on Kaggle, I wrote this notebook with guidelines for writing more professional data science code.

SQL Summer Camp

As part of the SQL Summer Camp I wrote a workshop as an introduction to BigQuery ML. It is based on the official documentation tutorial. In this tutorial, you use the sample Google Analytics sample dataset for BigQuery to create a model that predicts whether a website visitor will make a transaction. For information on the schema of the Analytics dataset, see BigQuery export schema

Intro to API’s

This was a three day event held during Kaggle CareerCon 2019. Each day we learned about a new part of developing an API and put it into practice. By day 3, you’ll have written and deployed an API of your very own!

Getting Started with Automated Data Pipelines

The Getting Started with Automated Data Pipelines series is a set of three notebooks and livestreams (recordings are available) designed to help you get started with creating data pipeline that allow you to automate the process of moving and transforming data.

Dashboarding with Notebooks

Want to learn how to combine the speed of spinning up a notebook with the ease of an automatically updating dashboard? Then this is the event for you! Each day from December 17th to December 21st 2018 you’ll get a practical, hands-on exercise that won’t take more than 20 minutes but will help you refine your dashboarding skills.

JupyterCon 2018 Workshops

At JupyterCon 2018 I gave two workshops. You can find all my materials in these two notebooks:

5-Day Challenges

I ran several educational 5-Day Challenges on different topics in 2017 & 2018. Each challenge consists of five short exercises designed to give you hands-on practice with a different data science technique. This notebook collects links to the exercises for each challenge so you can work through them at your own pace.


5-Day Data Challenge

  • Topic: Getting started with data science

  • Level: Beginner

  • Language: Python and R

  • Daily tasks:

  • Day 1: Reading data into a kernel

  • Day 2: Plot a Numeric Variable with a Histogram

  • Day 3: Perform a t-test

  • Day 4: Visualize categorical data with a bar chart

  • Day 5: Using a Chi-Square Test

New to data science? Need a quick refresher? This five day challenge will give you the guidance and support you need to kick-start your data science journey.

By the time you finish this challenge, you will:

  • Read in and summarize data

  • Visualize both numeric and categorical data

  • Know when and how to use two foundational statistical tests (t-test and chi-squared)

All the material for this challenge is in one notebook.


5-Day Data Challenge: Regression

By the time you finish this challenge, you’ll understand how and when to implement three foundational regression techniques. Each day we will cover one aspect of regression analysis in depth.

  • How to pick the right regression technique for your data

  • How to use diagnostic plots to check your model

  • How to interpret and communicate your model

  • Visualizing your model

  • Comparing models & selecting variables

We’ll work with real datasets to help develop an intuitive understanding of how each type of model works and how to interpret the results.


SQL Scavenger Hunt (not a 5-Day Challenge, but follows a similar format)

In our SQL Scavenger Hunt, you’ll learn how to use SQL to get data from BigQuery databases. Each day you’ll learn about a core SQL technique and practice using it to get the data you need to answer real-world questions like:

  • How many GitHub users made more than ten commits on January 1, 2015?

  • Which five cities had the highest air pollution last week?

  • You’ll also learn best practices for working with BIG datasets.

SQL (short for “Structured Query Language”) is the primary way to get data out of relational databases. It’s also the third most popular software tool for data science, right after Python and R, and a key skill for aspiring data scientists to develop.

This challenge is also available as a Learn track


Python: 5-Day Data Challenge: Data Cleaning

Data cleaning is a key part of data science, but it can be deeply frustrating. Why are some of your text fields garbled? What should you do about those missing values? Why aren’t your dates formatted correctly? How can you quickly clean up inconsistent data entry? In this five day challenge, you’ll learn why you’ve run into these problems and, more importantly, how to fix them!

In this challenge we’ll learn how to tackle some of the most common data cleaning problems so you can get to actually analyzing your data faster. We’ll work through five hands-on exercises with real, messy data and answer some of your most commonly-asked data cleaning questions.


R: 5-Day Data Challenge: Data Cleaning

Data cleaning is a necessary part of data science, but it can be deeply frustrating. What are you supposed to do with this .json file? How can you handle all these missing values in your data? Is there a fast way to get rid of duplicate entries? In this challenge, we’ll learn how to solve some common data cleaning problems.

This challenge is in R and covers different topics from the earlier Python version of the Data Cleaning 5-Day Challenge so even if you did the last challenge, you’ll discover some new tips and tricks! Here’s what we’ll be covering: