Posts by Collection

portfolio

Portfolio item number 1

Published: January 13, 2023

Short description of portfolio item number 1

Portfolio item number 2

Published: January 13, 2023

Short description of portfolio item number 2

publications

The SLAY Database: A Meta-Analytic Database of Sign Language Grammars

Published in Workshop on Databases and Corpora in Linguistics, 2014

Download paper here

Recommended citation: Tatman, R. (2014). The SLAY Database: A Meta-Analytic Database of Sign Language Grammars. Workshop on Databases and Corpora in Linguistics. https://github.com/rctatman/personal-website/blob/master/files/Tatman_2014_SLAYDatabase_Presentation.pdf

The Sign Language Analyses (SLAY) Database

Published in University of Washington Working Papers in Linguistics, 2015

Download paper here

Recommended citation: Tatman, R. (2015). The Sign Language Analyses (SLAY) Database. University of Washington Working Papers in Linguistics. https://github.com/rctatman/personal-website/blob/master/files/Tatman_2015_SLAYDatabase.pdf

The cross-linguistic distribution of sign language parameters

Published in Proceedings of theForty-first Annual Meeting of The Berkeley Linguistics Society, 2015

Download paper here

Recommended citation: Tatman, R. (2015). The cross-linguistic distribution of sign language parameters. Proceedings of theForty-first Annual Meeting of The Berkeley Linguistics Society. 41, https://github.com/rctatman/personal-website/blob/master/files/Tatman_2015_CrosslinguisticDistributionOfParameters.pdf

The cross-linguistic distribution of sign language parameters

Published in Berkeley Linguistics Society, 2015

Download paper here

Recommended citation: Tatman, R. (2015). The cross-linguistic distribution of sign language parameters. Berkeley Linguistics Society. https://github.com/rctatman/personal-website/blob/master/files/Tatman_2015_CrosslinguisticDistributionOfParameters_Presentation.pdf

Individual sensitivity to spectral and temporal cues in listeners with hearing impairment

Published in Journal of Speech, Language, and Hearing Research, 2015

Download paper here

Recommended citation: Souza, P., Wright R., Blackburn M., Tatman R.., & Gallun F.. (2015). Individual sensitivity to spectral and temporal cues in listeners with hearing impairment. Journal of Speech, Language, and Hearing Research. https://github.com/rctatman/personal-website/blob/master/files/Tatman_2015_IndividualSensitivity.pdf

#go awn: Sociophonetic Variation in Variant Spellings on Twitter

Published in Working Papers of the Linguistics Circle of the University of Victoria, 2015

Download paper here

Recommended citation: Tatman, R. (2015). #go awn: Sociophonetic Variation in Variant Spellings on Twitter. Working Papers of the Linguistics Circle of the University of Victoria . 25(2), https://github.com/rctatman/personal-website/blob/master/files/Tatman_2015_GoAwn.pdf

go awn: Sociophonetic Variation in Variant Spellings on Twitter

Published in Northwest Linguistic Conference, 2015

Download paper here

Recommended citation: Tatman, R. (2015). go awn: Sociophonetic Variation in Variant Spellings on Twitter. Northwest Linguistic Conference. https://github.com/rctatman/personal-website/blob/master/files/Tatman_2015_GoAwn_Presentation.pdf

Hand Choice Lateralization as Phonologization of Sign Language Pronouns

Published in Workshop on Computational Phonology \& Morphology, 2015

Download paper here

Recommended citation: Tatman, R. (2015). Hand Choice Lateralization as Phonologization of Sign Language Pronouns. Workshop on Computational Phonology & Morphology . https://github.com/rctatman/personal-website/blob/master/files/Tatman_2015_HandChoiceLateralization.pdf

The State of the Stats: Current Use of Statistical Methods Across Linguistics Subfields

Published in Linguistics Summer Institute, 2015

Download paper here

Recommended citation: Tatman, R. (2015). The State of the Stats: Current Use of Statistical Methods Across Linguistics Subfields. Linguistics Summer Institute. https://github.com/rctatman/personal-website/blob/master/files/Tatman_2015_StateOfTheStats.pdf

Comparing the Use of Sociophonetic Variables in Speech and Twitter

Published in New Ways of Analyzing Variation (NWAV) 44, 2015

Download paper here

Recommended citation: Tatman, R. (2015). Comparing the Use of Sociophonetic Variables in Speech and Twitter. New Ways of Analyzing Variation (NWAV) 44. https://github.com/rctatman/personal-website/blob/master/files/Tatman_2015_ComparingSpeechAndTwitter.pdf

I’m a spawts guay: Comparing the Use of Sociophonetic Variables in Speech andTwitter

Published in Selected Papers from NWAV 44, 2016

Download paper here

Recommended citation: Tatman, R.. (2016). "I'm a spawts guay": Comparing the Use of Sociophonetic Variables in Speech andTwitter. Selected Papers from NWAV 44. https://github.com/rctatman/personal-website/blob/master/files/Tatman_2016_ImASpawtsGuay.pdf

#PronouncingThingsIncorrectly: Initial phonological generalizations of a novel Internet wordgame

Published in NorthWest Phonetics \& Phonology Conference, 2016

Download paper here

Recommended citation: Tatman, R. (2016). #PronouncingThingsIncorrectly: Initial phonological generalizations of a novel Internet wordgame. NorthWest Phonetics & Phonology Conference. https://github.com/rctatman/personal-website/blob/master/files/Tatman_2016_PronouncingThingsIncorrectly.pdf

Speaker Dialect is a Necessary Feature to Model Perceptual Accent Adaptation in Humans

Published in 4th Pacific Northwest Regional NLP Workshop: NW-NLP 2016, 2016

Download paper here

Recommended citation: Tatman, R. (2016). Speaker Dialect is a Necessary Feature to Model Perceptual Accent Adaptation in Humans. 4th Pacific Northwest Regional NLP Workshop: NW-NLP 2016. https://github.com/rctatman/personal-website/blob/master/files/Tatman_2016_SpeakerDialectNecessary.pdf

We Who Tweet: Pronominal Relative Clauses on Twitter

Published in Corpus Linguistics Fest 2016, 2016

Download paper here

Recommended citation: Conrod, K., Tatman R., & Koncel-Kedziorski R. (2016). We Who Tweet: Pronominal Relative Clauses on Twitter. Corpus Linguistics Fest 2016. https://github.com/rctatman/personal-website/blob/master/files/Tatman_2016_WeWhoTweet.pdf

Listening with American Ears: Using Social Information in Perceptual Learning

Published in 3rd Conference on Experimental Approaches to Perception and Production of Language Variation, 2016

Download paper here

Recommended citation: Tatman, R. (2016). Listening with American Ears: Using Social Information in Perceptual Learning. 3rd Conference on Experimental Approaches to Perception and Production of Language Variation. https://github.com/rctatman/personal-website/blob/master/files/Tatman_2016_ListeningWithAmericanEars.pdf

Gender and Dialect Bias in YouTube’s Automatic Captions

Published in Ethics in Natural Language Processing, 2017

Download paper here

Recommended citation: Tatman, R.. (2017). Gender and Dialect Bias in YouTube's Automatic Captions . Ethics in Natural Language Processing. https://github.com/rctatman/personal-website/blob/master/files/Tatman_2017_GenderAndDialectBias.pdf

Oh, I’ve Heard That Before: Modelling Own-Dialect Bias After Perceptual Learning by Weighting Training Data

Published in Workshop on Cognitive Modeling and Computational Linguistics, 2017

Download paper here

Recommended citation: Tatman, R.. (2017). "Oh, I've Heard That Before": Modelling Own-Dialect Bias After Perceptual Learning by Weighting Training Data. Workshop on Cognitive Modeling and Computational Linguistics. https://github.com/rctatman/personal-website/blob/master/files/Tatman_2017_HeardThatBefore.pdf

Social Identity and Punctuation Variation in the #BlueLivesMatter and #BlackLivesMatter Twitter Communities

Published in 33rd Northwest Linguistics Conference, 2017

Download paper here

Recommended citation: Tatman, R., & Paullada A. (2017). Social Identity and Punctuation Variation in the #BlueLivesMatter and #BlackLivesMatter Twitter Communities. 33rd Northwest Linguistics Conference. https://github.com/rctatman/personal-website/blob/master/files/Tatman_2017_SocialIdentityAndPunctuation.pdf

‘He maybe did’ or ‘He may be dead’? The use of acoustic and social cues in applying perceptual learning of a new dialect

Published in 173rd Meeting of the Acoustical Society of America, 2017

Download paper here

Recommended citation: Tatman, R. (2017). "He maybe did" or "He may be dead": The use of acoustic and social cues in applying perceptual learning of a new dialect. 173rd Meeting of the Acoustical Society of America. https://github.com/rctatman/personal-website/blob/master/files/Tatman_2017_HeMaybeDid.pdf

#MAGA or #TheResistance: Classifying Twitter users’ political affiliation without looking at their words or friends

Published in Women and Underrepresented Minorities in Natural Language Processing, 2017

Download paper here

Recommended citation: Tatman, R. (2017). #MAGA or #TheResistance: Classifying Twitter users' political affiliation without looking at their words or friends. Women and Underrepresented Minorities in Natural Language Processing . https://github.com/rctatman/personal-website/blob/master/files/Tatman_2017_MAGAorTheResistance.pdf

Non-lexical Features Encode Political Affiliation on Twitter

Published in Workshop on Natural Language Processing and Computational Social Science at ACL, 2017

Download paper here

Recommended citation: Tatman, R., Stewart L., Paullada A., & Spiro E. (2017). Non-lexical Features Encode Political Affiliation on Twitter. Workshop on Natural Language Processing and Computational Social Science at ACL. https://github.com/rctatman/personal-website/blob/master/files/Tatman_2017_NonlexicalFeatures.pdf

A Practical Taxonomy of Reproducibility for Machine Learning Research

Published in Reproducibility in Machine Learning Workshop at ICML 2018, 2018

Recommended citation: Tatman, R., VanderPhttp://www.rctatman.com/files/2018-7-14-MLReproducability.pdflas, J., & Dane, S. (2018). A Practical Taxonomy of Reproducibility for Machine Learning Research. Reproducibility in Machine Learning Workshop at ICML 2018. http://www.rctatman.com/files/2018-7-14-MLReproducability.pdf

talks

Why does NLP need sociolinguistics?

Published: September 25, 2017

This talks covers the basics of sociolinguisitics and discusses why it’s important to considering linguistic variation when designing NLP applications.

Intro to Kaggle: XGBoost!

Published: January 16, 2018

This workshop was both an introduction to Kaggle and a beginner-friendly workshop on XGBoost algorithm. You’ll need to provide some info to watch the video, but the same content is covered in the code.

Character Encoding and You�

Published: January 23, 2018

Why does your text output have all those black boxes in it? Why can’t it handle Portuguese? The answer is most likely “character encoding”. This talk will cover some of the common character encoding gotchas and cover some defensive programming practices to help your code handle multiple encodings.

Socially-Stratified Validation for ML Fairness

Published: February 13, 2018

In this talk, I cover some of the frameworks used to think about fairness in machine learning. Then I turn to more practical matters of determining which social factors are important in machine leaning, how to find appropriate validation data, and considerations when selecting metrics. Finally, I walk through a sample socially-stratified validation pipeline.

How to find stories in data through visualization

Published: March 09, 2018

Working with data is a kind of interview - it is a complex back-and-forth, drawing out the expressiveness of data. The process is often visual, depending heavily on a sequence of graphical displays, “visualizations.” This three-hour workshop will focus on the concepts and skills you need to use data visualization effectively as part of your reporting practice - to conduct a data interview. You will learn how to spot trends, highlight changes over time, identify outliers, make meaningful comparisons, and describe important patterns in your data - all through the effective use of visualization strategies. This class will be based in the R language and distributed through Jupyter notebooks. These pre-built examples can later be customized to suit your own projects when you return to your newsroom.

How to Give a Lightning Talk

Published: March 19, 2018

Lightening talks are quick talks, usually under 5 minutes. The short format makes the great for first time speakers! This is a very meta lightening talk on how to give a lightening talk, and covers how to develop your talk, practice it and some of my best public-speaking tips.

What you can, can’t and shouldn’t do with social media data

Published: July 28, 2018

Information for my talk, “What you can, can’t and shouldn’t do with social media data” given at the 2018 Joint Statistical Meetings.

Evaluating and Improving Reproducibility in Machine Learning

Published: August 08, 2018

Reproducibility in machine learning means you can run the same code on the same data and get the same results. While this may seem relatively straightforward, there are plenty of potential pitfalls. In this talk, we’ll discuss a scale for evaluating the reproduciblity of a machine learning project and how to make sure that your own work is easy to reproduce. While this talk is focused on researchers (it’s based on a paper I presented at an ICML workshop), the tips and tricks should apply to anyone who does exploratory data analysis or machine learning generally.

Reproducible Research Best Practices (highlighting Kaggle Kernels)

Published: August 21, 2018

In this workshop, we’ll take an existing research project and make it fully reproducible using Kaggle Kernels. This workshop will include hands-on instruction and best practices for each of the three components necessary for completely reproducible research.

I do, We do, You Do: Supporting active learning with notebooks

Published: August 22, 2018

The gradual release of responsibility instructional model (also known as the I do, We do, You do model) is a pedagogical technique developed by Pearson & Gallagher where students engage with material more independently over time. In this workshop, participants will learn how to apply the I do, We do, You do framework to teaching with Jupyter notebooks. Over the course of the workshop, participants will complete a series of exercises designed to help them use Jupyter notebooks more effectively support active learning in the classroom.

Data Science Portfolios

Published: September 19, 2018

This talk describes how to put together a data science portfolio that will help you stand out, different kinds of data science jobs and how to tailor your application to shine as a candidate.

Mixed Effects Regression

Published: September 26, 2018

The combination of power, flexibility and clearly interpretable models make it a very powerful technique. I’ll introduce you to the method (no stats background required!), show you how to apply it to your own datasets and walk you through some tricks for clearly visualizing the output.

Should you keep the tweet?: Balancing reproducibility, open data and participant privacy

Published: October 18, 2018

In this talk for the Computational Sociolinguistics workshop, I discuss how to balance three core ideals when collecting data and publishing research.

Paper Discussion: The Importance of Being Recurrent for Modeling Hierarchical Structure

Published: November 27, 2018

You may, in fact, need more than attention. This paper is a comparison of the ability of recurrent and non-recurrent (i.e. transformer) neural network structures, focusing on their ability to model hierarchical relationships in natural language. The authors found that for both subject-object agreement and logical entailment, RNN’s outperformed transformers. While there is limited theoretical support for these findings, the empirical results are compelling.que developed by Pearson & Gallagher where students engage with material more independently over time. In this workshop, participants will learn how to apply the I do, We do, You do framework to teaching with Jupyter notebooks. Over the course of the workshop, participants will complete a series of exercises designed to help them use Jupyter notebooks more effectively support active learning in the classroom.

Data Structures in R

Published: January 23, 2019

This talk covers the basics of R’s data structures, as well as two data structures that aren’t included in Base R: linked lists and hashtables.

Setting Up Your Public Data for Success

Published: March 25, 2019

If you’re sharing your data, you probably want people to actually use it. This talk lays out some concrete strategies you can apply to help interested folks find and use your public data.

PUT DOWN THE DEEP LEARNING: When not to use neural networks (and what to do instead)

Published: May 04, 2019

The deep learning hype is real, and the Python ecosystem makes it easier than ever to neural networks to everything from speech recognition to generating memes. But when picking a model architecture to apply to your work, you should consider more than just state of the art results from NeurIPS. The amount of time, money and data available to you are equally, if not more, important. This talk will cover some alternatives to deep learning, including regression, tree-based methods and distance based methods. More importantly, it will include a frank discussion of the pros and cons of different methods and when it makes sense to use each in practice.

Intro to Computational Sociolinguistics

Published: May 30, 2019

All language data, whether text, speech or sign, reflects the social identity of the user and the environment they were in when they produced that language. This systematic social variation in language has been studied in linguistics for decades, but is increasingly important as we build and deploy tools that rely on automatic analysis. Failure to account for sociolinguistic variation can reduce overall system performance or, more worryingly, result in systems that are systematically biased against certain classes of users.

State of the Sesame Street (Are those NLP folks, like, ok?)

Published: May 31, 2019

In this talk, I spent five minutes over-explaining the joke where NLP algorithms are named after Sesame Street characters.

Unsupervised Text Classification & Clustering: What are folks doing these days?

Published: June 07, 2019

In this talk, I outline the techniques I considered for an unsupervised clustering/topic modelling project to summarize Kaggle forum posts.

Unsupervised Natural Language Processing Techniques and Kaggle Forums

Published: September 19, 2019

In this talk, I talk about how I used unsupervised NLP techniques to create clusters to let me more quickly follow what’s going on in the Kaggle forums.

Sociolinguistic Variation and Automatic Speech Recognition: Challenges and Approaches

Published: February 14, 2020

Failing to account for sociolinguistic variation can result in accuracy differences between groups and generally worsens performance for members of minority groups. How to handle sociolinguistic variation in ASR systems, especially systems trained via deep learning, is an area of active research. This talk will introduce current approaches from natural language processing and discuss their benefits and drawbacks.

Rules + Deep Learning: Why you need both to build Conversational AI that actually works

Published: March 21, 2020

This talk covers a brief overview of the history of NLP, the benefits and drawback of DL and rule-based systems and how we combine both approaches at Rasa.

Intro to BERT-ology

Published: May 14, 2020

This talk covers the current research (in May 2020) into how BERT and related models capture information, ablation studies and drawback/weaknesses.

What I Won’t Build

Published: July 05, 2020

This talk goes over my own personal development in terms of what ethical NLP looks like and where I currently stand, including a list of the specific types of applications I won’t build.

Sure, transformers are cool… but have you tried rules?

Published: December 10, 2020

This talk covers a brief overview of the history of NLP, where rule-based and deep learning systems fit in and a possible glimpse of the future.

Data Science Portfolios (Updated)

Published: January 18, 2021

This talk describes how to put together a data science portfolio that will help you stand out, different kinds of data science jobs and how to tailor your application to shine as a candidate.

AI = your data

Published: February 08, 2021

New algorithms may get the press, but the real heart of any AI project is data collection and curation. This talk will show you why getting to know your data is so important and provide best practices for improving your data curation and annotation.

5 mistakes you’ll probably make with language data (and how to recover)

Published: September 09, 2021

Language is fundamentally different from other types of data, and it’s inevitable that you’ll run into some language-specific issues. This talk will cover some of the most common types of errors I’ve seen data analysts and machine learning engineers make with language data, from ignoring the differences between text genres to treating text as written speech to assuming that all languages work like English. We’ll also talk about ways to avoid these common mistakes (and recover gracefully if you’ve already made them).

Testing, Validation and Evaluation: How Do You Know if Your NLP System Actually Works

Published: September 21, 2021

Abstract: If you’ve ever built–or thought about building–an NLP system, you’ve probably run into a few questions: How can you tell if it’s working? How will you know if it continues to work in the future? How do you know when you should you update your models, if ever? Luckily, there are tools to help you! This talk will cover the differences between testing, validation and evaluation, explain why you need all three, and walk through an example with a chatbot system.

Chatbots can be good: What we learn from unhappy users

Published: October 11, 2021

It’s no secret that chatbots have a bad reputation: no one enjoys a cyclical, frustrating conversation when all you need is a quick answer to an urgent question. But chatbots can, in fact, be good. Having bad conversations can help us get there before they’re ever deployed. This talk will draw on both academic and industry knowledge to discuss problems like: What do users’ reactions to unsuccessful systems tell us about what successful systems should look like? Are we evaluating the right things… or the easy to measure things? Do we really have to look at user data? If so, when and how often? When, if ever, should we retire old methods?

Open Source AI Chatbots

Published: October 19, 2021

There have been major advances in natural language processing in the few years, particularly in developing and refining new network architectures like transformers, that allow chatbots to handle natural language input more robustly. In this talk, we’ll cover what the differences are between machine learning and rule-based approaches to building chatbots and when to use each. We’ll also quickly walk through what you need to know to start building your own AI chatbots using Rasa’s open source framework, as well as practical recommendations for improving any AI chatbot after deployment using conversation driven development.

teaching

Presenting Text Stimuli For Production Experiments in PsychoPy Using the Builder

Handouts, Phonetics Lab, 2012

This is a very clear tutorial to creating a basic text-based experiment in PscyhoPy using the builder. Includes pictures of the interface and very simple, easy-to-follow directions. Designed for someone who has never used PsychoPy before.

Bayesian Statistics for Linguists

Slides, Phonetics Lab, 2013

This is a basic introduction to Bayesian Statistics for Linguists, which covers the fundamental differences between Bayesian and Frequentist statistics, a very shallow introduction to the Bayes theorem and some additional resources.

Linguistics Outreach

Posters & Activicties, Pacific Science Center, 2014

These materials were developed for Paws on Science; an annual event put on by the Pacific Science Center to help UW scientists connect with the public. They were designed with an elementary-school audience in mind.

Active Learning and Presenting Research

Handouts, TA/RA Conference on Teaching, Learning and Research, 2014

Handouts prepared for the annual TA/RA Conference on Teaching, Learning and Research at the University of Washington on incorporating active learning strategies in the classroom and presenting research. Geared towards graduate students in their first few years.

Dr. Rachael Tatman

Posts by Collection

portfolio

publications

talks

teaching