Episode Summary

Are you still using loops and lists to process your data in Python? Have you heard of a Python library with optimized data structures and built-in operations that can speed up your data science code? This week on the show, Jodie Burchell, developer advocate for data science at JetBrains, returns to share secrets for harnessing linear algebra and NumPy for your projects. Jodie details how most people begin their data science journey using loops to iterate over values and apply operations sequentially. We talk about how loops are friendly for beginners, being clear to read and easy to debug, but unfortunately don’t scale well, especially with large amounts of data. Jodie shares some of the basics of linear algebra and how to organize data into vectors. We talk about how the NumPy library leverages those concepts to improve data processing. We discuss how the library includes operations for vector and matrix addition and subtraction, and why these operations are more efficient than loops. We also cover how NumPy stores arrays in memory and when working with them is faster vs when it’s not. Course Spotlight: Data Cleaning With pandas and NumPy In this video course, you’ll learn how to clean up messy data using pandas and NumPy. You’ll become equipped to deal with a range of problems, such as missing values, inconsistent formatting, malformed records, and nonsensical outliers. Topics: 00:00:00 – Introduction 00:02:35 – Vectorize all the things! - PyCon UK 2022 Talk 00:06:39 – Becoming familiar with linear algebra 00:09:05 – Beginners start with loops 00:11:25 – Starting with basic linear algebra 00:12:25 – The basic unit of a vector 00:18:06 – NumPy representing vectors in Python 00:23:25 – Sponsor: InfluxDB 00:24:13 – Block management 00:25:54 – Replacing a loop with vector-based operations 00:34:06 – NumPy broadcasting 00:38:52 – Approximating nearest neighbors 00:43:49 – Video Course Spotlight 00:45:15 – Solving the problem 00:46:44 – Getting rid of nested loops 00:48:54 – A peek under the hood 00:53:28 – How arrays vs lists are stored in memory 01:00:24 – Considering a GPU 01:03:37 – Real Python resources on the subject 01:04:08 – Upcoming talks and conferences 01:07:31 – Thanks and goodbye Show Links: Vectorize all the things! How basic linear algebra can speed up your data science code - YouTube Introduction to Linear Algebra, 5th Edition Linear Algebra - Mathematics - MIT OpenCourseWare Linear Algebra and Learning from Data Linear Algebra in Python: Matrix Inverses and Least Squares NumPy: the absolute basics for beginners - NumPy Manual Broadcasting — NumPy v1.24 Manual spotify/annoy: Approximate Nearest Neighbors in C++/Python optimized Look Ma, No For-Loops: Array Programming With NumPy – Real Python
... Show More

    No results