Episode Summary

R. Tyler Croy, a principal engineer at Scribd, joins Corey Quinn to explain what happens when simple tasks cost $100,000. Checking if files are damaged? $100K. Using newer S3 tools? Way too expensive. Normal solutions don't work anymore. Tyler shares how with this much data, you can't just throw money at the problem, but rather you have to engineer your way out.About R. Tyler: R. Tyler Croy leads infrastructure architecture at Scribd and has been an open source developer for over 14 years. His work spans the FreeBSD, Python, Ruby, Puppet, Jenkins, and Delta Lake communities. Under his leadership, Scribd’s Infrastructure Engineering team built Delta Lake for Rust to support a wide variety of high performance data processing systems. That experience led to Tyler developing the next big iteration of storage architecture to power large-scale fulltext compute challenges facing the organization.Show Highlights:01:48 Scribd's 18-Year History04:00 One Document Becomes Billions of Files05:47 When Normal Physics Stop Working08:02 Why S3 Metadata Costs Too Much10:50 How AI Made Old Documents Valuable13:30 From 100 Billion to 100 Million Objects15:05 The Curse of Retail Pricing 19:17 How Data Scientists Create Growth21:18 De-Normalizing Data Problems25:29 Evolving Old Systems27:45 Billions Added Since Summer29:29 Underused S3 Features31:48 Where to Find TylerLinks: Scribd: https://tech.scribd.comMastodon:  https://hacky.town/@rtylerGitHub: https://github.com/rtylerSponsored by: duckbillhq.com
... Show More



    No results