5 mistakes you’ll probably make with language data (and how to recover)


Language is fundamentally different from other types of data, and it’s inevitable that you’ll run into some language-specific issues. This talk will cover some of the most common types of errors I’ve seen data analysts and machine learning engineers make with language data, from ignoring the differences between text genres to treating text as written speech to assuming that all languages work like English. We’ll also talk about ways to avoid these common mistakes (and recover gracefully if you’ve already made them).