CoRecursive #050

Portal Abstractions with Sam Ritchie

How abstract algebra solves data engineering

Today the story of how twitter engineers came up with a unique solution to data engineering.

Adam interviews Sam about how the abstract algebra and probabilistic data structures help solve fast versus big data issues that many are struggling with.

Adam talks to Sam Ritchie, a machine learning researcher. Stop in to hear Adam and Sam’s conversation about portal abstractions that let you leverage work from other fields.

Transcript

Note: This podcast is designed to be heard. If you are able, we strongly encourage you to listen to the audio, which includes emphasis that's not on the page

Twitter’s Impossible Data Problem

Adam Gordon Bell: Several years ago, Twitter had this problem that may sound familiar. The problem is big data versus fast data, or batch processing versus real-time stream processing. Probably because of the scale that they operate at and the real-time nature of the Twitter feed, they hit this problem earlier than the rest of us. Batch is very efficient. You can calculate things on years of data, but doing the calculation might take a day. Real-time is much faster, but you can sort of only work forward in time.

Really any both these things, if I want to look at my most liked tweet, I need to look at all the old batch data, but also any real-time tweets that I’m doing. It’s 2013. Sam Richie is a mechanical engineer by training. His uncle is Dennis Ritchie and he’s working at Twitter, and his job is just translating from one system to the other, from real-time to a batch job.

Sam Ritchie: And there’s just so many jobs that looked like that, and it kind of looked like that was going to be my life, like just coding these things. You just said enough is enough. We need to figure out how to write one piece of code that will run in real-time world and also in batch world.

Adam Gordon Bell: I guess I’m coming at the problem from that side where it’s like, oh, I can calculate this thing, but I just started calculating it and like the world existed before.

Sam Ritchie: Yeah. The backfill problem’s hard and it’s hard, like it just consumes your life writing backfill jobs. You stick to really simple things because you know you’re going to have to write them twice. You’re going to have to maintain them twice. I mean, it’s really not a nice way to live.

What If Math Already Solved This?

Adam Gordon Bell: Hello and welcome to CoRecursive. I’m Adam Gordon Bell. How did Sam get away from these one-off jobs? How did they solve this fast versus big data issue? An issue that many are still struggling with. The answer isn’t some new data processing system that he’s here to shell. The answer is actually abstract algebra and probabilistic data structures. If you don’t know what those are, don’t worry, we’re going to walk through it. We’re also going to talk about what Sam calls portal abstractions.

That’s finding abstractions that let you leverage work from other fields, but we’ll do that at the ending. Let’s start at the beginning when Sam was working away at Twitter.

Sam Ritchie: I was on the revenue team. I had a colleague named Oscar Boykin who I didn’t know that well. We both maintained one of these libraries I mentioned before that lived on top of Hadoop. He had the Scala version. I had the Clojure version. Kind of yet again I had a task for work that was building one of these dashboards. You’re trying to count something like tweets per user per day. You’re just basically grouping on some key and then adding numbers to a database.

There’s just so many jobs that looked like that, and it kind of looked like that was going to be my life, like just coding these things. Through this or that, Oscar and I teamed up to work on some shared piece of machinery for serialization between the code we both maintained. And we realized that we both were doing this sort of thing, and he’s got a lower tolerance for BS. He just said, “Enough is enough. We need to figure out how to write one piece of code that will run in real-time world and also in batch world.

This happens with compilers. This is a compiler problem. Let’s just back off and solve it.”

Adam Gordon Bell: So they solved it. They built version one of this open source analytics. System could do batch. It could do real-time. Some of the results together they called it Summingbird. It sums things and it’s from Twitter, so it’s a bird, Summingbird. But then they had some interesting revelations about what they had created.

The Suspicious Simplicity of Adding

Sam Ritchie: If you really simplify what we’re dealing with here, you’re writing code that is generating for some key tweets per user per hour. It’s something that happens. It might just be tweets per user. It might be how many users we have. And then you have some value. You have some counter that you’re ticking, so you’re just incrementing this thing up. And so many machine learning features and dashboards and everything is just like ticking counters. That’s like the secret of analytics work, because you’re just adding once. That seems suspicious too.

That’s something you can maybe break out. Like, really? Is that all we can do? How would you do more complicated things? This software package, Summingbird, was a library that lets you write a logical declaration of what you wanted to happen. And then the second component of it would take that data structure that you’ve built and go run it on any number of these different underlying platforms.

I can power a dashboard. I can do backfills. I have this boundary that’s maintained transparently to me that behind the scenes will give a hard line between the massive multiyear database and then the last couple of hours that are stored in some much harder to manage, much more fragile, but very, very fast online processing system. Phase one was just write something that can do… You know logically you’re doing the same thing. You’ve been rewriting the same code, just have the machine write it for you. But it does open this door, and this is the topic I wanted to get to.

It opens the door and you start to look at this thing and think, you have these two buckets. One is all all time before a couple of hours ago, and then you have a bucket for each of the recent hours. You’re doing this addition of numbers, but you’re sort of putting parentheses in one case around years and years of data, and then another set of parentheses around each of the previous few hours, and then you take this final step of adding them together.

Adam Gordon Bell: If you’re calculating how many times I’ve tweeted, Hadoop gets everything up to yesterday and then I’m adding that to something that’s actually getting the real-time data and just counting.

Hundreds of Years of Free Research

Sam Ritchie: That’s right. Those are nice ideas, but they’re sort of obvious and not that powerful. But this idea of adding things and putting parentheses wherever you want also seems kind of innocuous. It’s like this little lamp you pick up, right? You rub the lamp and what you find is like this idea is not… It’s not your new idea. It’s very simple, but it actually exists in this field of abstract algebra.

Adam Gordon Bell: All right. Abstract algebra, if you’re not familiar with it, it’s a subfield and math. Stick with me here. I’ll explain a little bit. A lot of concepts from abstract algebra can be implemented in a programming language. Semigroup is one of these. It’s an interface. It has one method, that method is add or sum, if we want to stick with the Summingbird pun. It also has a rule that things have to be associative. You might remember associativity from a high school math class. What happened here is Sam had the system where to calculate analytics, calculation had to implement a certain interface, and then the system could run it in both worlds. The interface had an add method, which meant it was a semigroup, which meant this subfield of math with people talking about seemingly obscure constructs could suddenly enable him to answer interesting questions in his real-time analytics dashboard thing at Twitter.

Sam Ritchie: You can start slimming things back. And when you slimmed it back almost to nothing, you’re left with this object called a semigroup, which is an idea of, okay, I need some set of things, so numbers in what we’ve been doing, which is just counting. Some way to “add” two of them together and get something out that’s like still the same type, and then a test that goes along with that. The test is that I have to be able to do that associatively. It sort of sounds like a pedantic mathy thing.

Like there’s this impulse, that I’m sure we can get into, in the functional programming world to like see ideas that seem mathy and slap math names on them and just tweet about it and start spraying at it. That sucks. The thing you get when you do that, when you identify this like mathematical concept is it’s not your… Because it wasn’t your core idea, people have been thinking about this it turns out: for a long time.

You have potentially hundreds of years of work that have gone into answering questions of, what can I do with types that are able to implement a plus method and then a single test of associatively calling plus. That is a very, very tiny interface to satisfy, but there’s a huge amount of work on, one,: all the things you can do just relying on these two properties, and two, like a zoo of data structures that all hold those properties.

By backing out what we built and making it not just about numbers, but about this thing, in Scala you might say, “I want to type where I can implement a type class called semigroup for that type.” If you just make that one change, suddenly you’ve kind of opened this portal into, again, this whole zoo of data structures. Anything that matches this tiny contract will fit into your model.

And that’s true of any abstraction, but this one was special because when you turn and look at the computer science literature, you find things I never would have thought about or never expected to work in the context of like an add dashboard job. Suddenly we were able to plug into this thing.

The work became less about how do we go manage real-time and batch and this kind of boring suit and tie stuff to, oh my God, we’ve gone into Narnia and suddenly there’s these approximate sketching data structures where I can feed items into it and it’ll give me a count for the unique number of items seen. And it can do that up to billions and billions and trillions of items and it just won’t get any bigger. It doesn’t actually have to store them. How the fuck does that work? That’s a different question.

All you know is that you can take two of these things, add them together, it works associatively. They suddenly become candidates for running on years and years of tweet data or Twitter data or any large-scale dataset and the results you know will make sense, will not have any errors, and will be real-time updatable.

Why Associativity Is the Real Key

Adam Gordon Bell: You went out to solve this problem of like real-time analytics and your solution is a semigroup. That doesn’t seem obvious to me I guess that that’s a solution to the problem of analytics.

Sam Ritchie: Semigroup and then the monoid. I mean, it doesn’t stop at the semigroup. Yeah, it doesn’t seem obvious that that’s the solution to analytics, but it is. What started to happen was you start to realize that, okay, well, it’s not the solution to analytics per se. What it is, though, is adding things together associatively. This seems to be the key that unlocks being able to store data in multiple places and merge together your results when you want.

Being able to distribute in space or time is tied somehow intimately to the associative property that we know from elementary school. That’s kind of odd. Why do I say it’s like intimately tied? Just because if you ask what you need to go do that, it’s really just this one simple property.

Adam Gordon Bell: Let’s do a tiny recap. If we want to run calculations in real-time and a batch, we need a common interface. That interface turns out to be semigroup from abstract algebra. And that interface is important whenever you need to distribute calculations. All right. Next up in solving analytics is how you deal with data that maybe missing. We’re going to change our interface to handle that, which will bring us to another interesting concept.

Sam Ritchie: There’s other properties like the idea of a missing value, like if you’re querying multiple databases and data might be missing. How do you deal with that? Well, okay, you probably need some notion of like a zero. If I’m adding numbers, adding zero doesn’t do anything. That’s fine. If I have lists of tweets I’m merging together, I can have all these nil checks or check for none or have optional types, or I can know that for a list if I can catenate an empty list, nothing happens.

My code becomes simpler now because I’ve got this idea of an empty data, an empty element of the type. You can start to see that a lot of data types have this idea, like a set has this idea, numbers, of course, have this idea. For multiplication, you kind of have this idea, or you do, it’s just one instead of zero, so that’s kind of odd. But in fact, there’s another thing called a monoid, which again has like this in your face name, but it’s just the same as the semigroup, but added on is this extra method you implement called identity.

You’re just getting back a thing. If I pass it into something else, it just won’t do anything. Again, a very, very simple idea, but what you get out of it is suddenly the ability to handle missing data. And that comes up all the time in dashboards. How do you represent data is not there yet?

Originality Is Overrated

Adam Gordon Bell: All right. We have semigroups so that we can distribute work. We had one more method to our interface and we get monoids, so that we can handle the absence of data. This is still a very small interface to enable distributed computing.

Sam Ritchie: It’s a very well-defined interface, but I really think of it like… It is like a portal into some interdimensional transport system. I used to think when I was thinking about how do I become creative, what do I want to do in software, and you want to make original things that no one’s done before, right? You want to crack the door open on something no one’s thought about. But I think more lately that that… I mean, that’s kind of lonely.

If you do that, if you actually succeed and make some 2001 obelisk, and that’s exciting, but contrast that with if I managed to build like a transporter gateway from Star Trek, right? And you look through and you weren’t creative at all. You just made a thing that millions of other galactic civilizations have made before. That’s good. Now you get to plug into the network.

Coming up with abstractions like figuring out some, I don’t know, end point to a website or the packet format required to talk to the web, that’s what picking an interface out of some incredibly well-trodden field, like abstract algebra does for you.

Adam Gordon Bell: This metaphor is great, but I think it needs a little explanation. The monolith is from 2001, A Space Odyssey. I don’t think anyone understands it, but it’s powerful. This is coming up with your own unique solution to a problem, a solution that no one has thought of, but the transporter gateway from Star Trek or an HTTP interface is less creative perhaps. You’re implementing something that someone else already built. It’s not really a new discovery, but you get to draw on all the existing solutions that exist for that interface.

You transform your unique problem into a known type of problem where known solutions exist. This is what Sam is calling a portal interface. What was on the other side of this portal behind your add function?

Sam Ritchie: I mean, what we got, we built a library of all the things we found. The library is called Algebird. Very concretely we got… I mean, the thing that I’ve never seen before was this whole zoo of data structure that… The core idea is that if you don’t really care about your exact value that you’re accumulating, so for numbers, maybe I want to counter, but I don’t really care that it’s exact. I’m happy with 0.1% error, maybe a hundredth or a thousandth of a percent.

It turns out there’s this whole field of research on data structures like this, where if you can give up a little bit of error or a little bit of accuracy, you can get often two orders of magnitude. You get a hundred X space savings on this thing. And that’s so outrageous that… I mean, that took me a while to even understand what the hell was going on.

The Bot That Broke the Database

Adam Gordon Bell: Why does the amount of space matter? I can make a monoid that just adds every Twitter user together.

Sam Ritchie: This is a great point. It’s not even that it doesn’t have anything to say about what you can add. It’s that you can plug things in that will just shatter the system. This example you gave, if I’m trying to go… Say I just want to keep lists of everybody’s tweets. I decide to group on a user. And every time a tweet comes in, I make a list with the single tweet in it. That’s my thing. How do you add lists? You just concatenate them together. No problem.

What you find is that most of your database is empty, because most people don’t tweet, the tweets they’re putting out anyway, and then some people just have these huge amounts of tweets they’re pumping out. I mean, some are bots that are just hammering out tweets every couple of minutes. You get these incredibly skewed keys in your database. Some of the values are just getting bigger and bigger and bigger. And there’s nothing in your system that has limited this from happening.

When you’re running some system that sometimes is fetching nothing, the default value, and sometimes it’s fetching dozens of megabytes of tweets and then filtering on them, this is in some sense orthogonal from your original problem. That’s totally logically fine to do.

Adam Gordon Bell: It still fits the interface of the…

Sam Ritchie: Yeah. It totally fits the interface and it’ll fit the database for a while, but it’s not everything you meant, because there’s some problem there. And the problem is that in almost all these systems, definitely at Twitter, there’s just skewed keys everywhere. Somebody has got the most followers. When they tweet, you’ve got to fan it out to everybody and that just like hammers the system. Whereas maybe when I tweet, no big deal. [inaudible 00:17:46] Why would you accept an accuracy loss? Yeah, I want the total result.

I want the full thing. I want to know how many followers I have. I don’t want to know how many followers I have plus or minus 1%. Maybe not though. It turns out if you can… Well, the problem you’re trying to solve is, how can I track counters and deaden the effect of these massive explosions of a particular key value pair? You get it for free with something like a counter, because people have done a lot of work to make sure that, okay, all our numbers up to some massive amount are going to use the same amount of bits.

Adam Gordon Bell: If it’s a long or something, it can only get so big.

Sam Ritchie: And if you want to double its size, just add a bit. No problem. Why do we just count numbers? It’s easy. Well, why is it easy? Well, a lot of problems are solved for you just because of the architecture you’re inheriting about how numbers are represented. If numbers actually took a ton more bits, if we hadn’t figured out how to write things in binary…

Adam Gordon Bell: Counting would be harder.

Sam Ritchie: Yeah. Adding lists is pretty hard or sets. Let’s have that example. If I want the set of how many followers I have, how many unique people have seen my tweet today, well, how would you implement that?

Adam Gordon Bell: I just add them to the set, and then I can combine sets by just getting rid of the… Doing a distinct.

Count a Trillion Items Without Growing

Sam Ritchie: Yup, exactly, if everybody had roughly the same number and it was small of people that saw their tweets. But sometimes there’s just huge amounts. The distinct set of people that have seen your tweet is just massively larger than the average. You get this massive set in memory and you’re serializing and deserializing it every time. And there’s two ways you can go. One is you can start to build any special cases into your system where the abstraction starts to leak.

And you say, “Well, I can’t really tolerate this. It’s not just a type with the semigroup. It’s this other thing and there’s more constraints.” That’s fine, if you can accept a little bit of error. If I don’t really care if my count of people that have seen my tweet is off by 10, which honestly, I don’t like, I mean, in that example, data gets dropped all the time. If you hit like on my tweet and then your phone’s offline, there’s already error just built into the universe. If I accept that and I just live with it, I can reach for a data structure like, here’s the buzzword.

There’s this thing called the HyperLogLog, where if you allocate this thing some very small amount of memory, you can get something like 99.9% accuracy on a count of how many unique things you’ve dumped into this. It’s an approximate set. You add things to it, or you sort of put things into the set, and then you can ask it the question, how many unique things have I seen before? And it’ll tell you, and it will be almost right and it won’t get any bigger.

Adam Gordon Bell: It doesn’t seem like it should be possible.

Sam Ritchie: It doesn’t seem like it should be possible. If you thought of that idea when you were working on your analytics system and you said, “Yeah, it’d be really nice if I could just count this thing and not have a set grow at all,” you’re not going to go take a few months and go off and figure that out. It just sounds impossible. But somebody figured it out and then somebody, maybe the same person, but somebody figured out that, oh, if I have two of these things, I can add them together. I can track users for a few hours. I can track my distinct counts.

And then if I have another set that represents stuff I’ve seen before, at a later time I can merge those two together and the result of the merged set will also satisfy the properties that I had with either of the two side ones.

Adam Gordon Bell: And then we can distribute it, both the stuff.

Sam Ritchie: Yes. Then you can stop and you can save. You can save your state, and then you can load it up again later and keep going. That’s really all we want to do. We want things where you can pause and wait a while and then load it back out and keep going. These approximate data structures get you that ability. If they have that ability, then you can plug them into a system like Summingbird that’s running these massive analytics jobs and things will just work and you’ll solve, again, your systems problem of heavily skewed key distributions.

That will just go away the same way it does when you use counts.

Should We Hide the Math Words?

Adam Gordon Bell: All right. We have our simple interface for real and batch, and it turned out that it already existed in abstract algebra. It was the monoid or the semigroup. We found this portal abstraction. We rated the research papers and found probabilistic data structures like the HyperLogLog that were monoids and run in fixed space. But I wanted to ask Sam about this pet topic of mine, do names for math help or hinder adoption in software?

I just imagine you standing up and being like, “HyperLogLog is semi group,” and nobody knows what the hell you’re talking about. But you’re like, no, this is important.

Sam Ritchie: I absolutely have the reaction that you’re saying. At first, I was kind of like, I had to write this job. Fine. We can do it this way. But then it just started to get more and more clear that we’d gone down some rabbit hole that was actually not just abstraction for abstraction sake. I had a few experiences of going out and finding papers that, again, implemented these… There was an approximate sort of sliding window counter. Would I have found the paper? No. Would I have taken the time to implement it? No, absolutely not.

But I’m aiming to implement these interfaces and pass these tests and then being able to immediately turn around and have like an approximate sliding window counter that would just work with stripes, like entire machine learning feature generation interface. I could take this thing, put it in the cupboard, write a nice doc string for it, write a little pitch for why you might want to use it, and it would just work. There’s no sort of, that doesn’t look like it would work in an analytic system. That just goes out the window.

It just will. We’ve got the test to prove it and pull it off. See what you can think of.

Adam Gordon Bell: Yeah. It seems so non-obvious to me, and I don’t really live in this world, so maybe it’s not obvious. But yeah, I don’t know. I hear people talk about fast data and big data and pipelines. I never hear anybody say like, “Hey, if you can make something a monoid, then you can calculate it either in batch or in real-time and you can combine it. And all you need to do is meet this interface and that’s it.”

Sam Ritchie: Yeah. Well, you heard it here. No, look, I’m with you. I listen to your podcast with DHH, and he was talking about Ruby, when he first picked up Ruby, this emotional sense he had. That really got me thinking about, why is it that this idea is not more out there? I mean, it’s not a tough idea in that if you didn’t need it to… If you just write the test down and you encountered it, you wouldn’t find it to be… You could do that code, there’d would be no problem.

But there’s this aesthetic sense with certain abstractions and there’s something about pulling up distractions from math that sounds… I don’t know. I mean, I’d love to hear… You’re in the functional programming world. Functional programming has this bad rap of just, oh, it’s all about category theory. We need to shove funders and monoids and monads. If you don’t get it, here’s this category theory textbook, you go figure it out.

Rename It and You Break the Portal

What we were trying to advertise was here are the names of these things and the names themselves are important because you’re going to find these names when you go on the hunt for stuff you can plug it, right? If you call it Addable, you have this problem of, okay, what do you solve? You make it more comfortable. And if I have a pre-existing library of things that I can plug in, this is great. I can look at the name Addable in the function slot, the parameter type. I can go look at the library, and I know what can fit into what.

But what you lose is this sense that you’re plugging into this larger mine, that you can go down and find new things. For someone who’s actually looking to expand the range, I think it would be not wise to change the name to something more comfortable because what you might do… Here’s something that happens. Adam, you might pick up this thing and you might go, “Well, okay, I’m going to make a new data type. I have Addable. That looks pretty easy. It’s got a plus method on it, and I can implement my thing.” I don’t pass the associated test, but that doesn’t really matter.

I’m still in Addable. I can solve them that plus. I’ll make my thing work. I’m just going to ignore the test and no problem. I just won’t implement that test for me. But now you’re in dangerous territory ripple. It was so tight. It was such a poetic little interface, that when you ditch one of the two lines, you’re totally off the map now. It really is tied I think to this idea of like the aesthetics of an abstraction. There’s an aesthetic response you have to…

Some people have an aesthetic response to these mathematical abstractions and go like, “Holy shit, I’m plugging into something big. I’m so happy this post was here. I have no intimidation at all.” Some people go, “I kind of remember getting my kicked in eighth grade in algebra. Is it really that again?”

Adam Gordon Bell: I think there’s cases where people are maybe overly extraneously using terminology, but here it’s actually the key to running things. It is paying weight, I guess, in actual business use cases.

Sam Ritchie: That’s right. This is, I mean, a thing I’m really passionate about. And the reason this stuff’s important is you want to go mine the literature of what other people have done. You want to go be able to plug these things into your work and really just benefit from this incredible community that’s been cranking for, again, maybe hundreds of years. Then you’re turning around and you’re presenting this aesthetic thing. And yes, it matters what the references are to the past, but it also needs to kind of present itself as its own thing to use.

Ideally good design is about giving people an on-ramp at every level of engagement they want. Experts only is like fine. But if you’re trying to build something that’s accessible across the entire range of experience and you find yourself confused about why monoid and semigroup and field are not like doing it for people, I think there’s more we need to learn there about how to go use these incredible minds of abstraction resource in modern code.

How Do You Spot a Useful Abstraction?

Adam Gordon Bell: I think this makes sense. These concepts are super valuable, and these concepts already have names. Maybe the names aren’t the problem here. Now I know how semigroups can model distributed calculations, how HyperLogLog can give me fixed overhead, but how do I find these obstructions on my own? How do I repeat this trick and find my own portal as Sam calls it? If I go through and I extract some interface for everything that has a name, like a dog has a name, a car as a name, how do I know if that’s a valuable thing or just me wasting time?

Sam Ritchie: Yeah, it’s hard. That’s the thing we all deal with as programmers. How do you know? I was thinking this the other day in a walk. I wonder if conspiracy theorists would be great software developers who would just be so sensitive to abstraction. You’ve seen patterns everywhere. There’s probably some dial in our brains that cranks up or down. And it’s hard. I don’t think an abstraction can tell you know what you just described, like extracting name for everything. Maybe it’s good. Maybe not.

You need a thousand examples that you look at and go, “I think I’ve got something really powerful here.” And if that gets you excited, you should do that. But simply if you want to make your search process faster, then there are these other fields where people have been thinking that way for a while.

Adam Gordon Bell: There’s this great talk about Richard Feynman.

Sam Ritchie: Oh yeah, yeah, yeah, totally. Tell the story though. Tell the story. I think I know where you’re doing.

Adam Gordon Bell: Richard Feynman collected all these problems over the course of his life, and he said that was the secret to him being so successful is he had all these problems. And then whenever somebody mentioned some new solution, he would just go through his list of problems and see if it solved them all, which I guess is kind of what you’re talking about, right? It’s like will monoid solve this? You try it on. Maybe it’s a horrible fit.

Sam Ritchie: Wow. I love that. That’s great. Yeah, that’s a brilliant Feynman story. He says he’ll get a click sometimes and go, “Ah! Here’s the connection.” People go, “How did he do that?” Well, you just don’t tell anyone when you don’t get a hit. I love that. Yeah, absolutely. You have a solution. You have some interface. If you learn about some abstraction, that seems powerful in another field, go backwards. Say, “Does this apply to what I’m doing?” Forget if it seems natural or obvious, but what would it mean if I forced it in?

Raid the Pre-1960s Cupboard

Adam Gordon Bell: What does this all say about the future of software development? How should we think about this idea of importing these portal concepts?

Sam Ritchie: I think that the clue I get from this is that… You’re trying to solve interesting problems. You’re trying to go expand the range of what is possible for you to build. If you buy this idea that these things just kind of lead toward greater complexity and interest, there’s always more to learn. There’s always more to do. One way to make progress is to go make new artifacts, new examples, new kind of works of art almost. We’re trying to build these things spun out of our thought. That’s really powerful. That’s really what it’s all about.

But in fact, there are other fields that had been obsessed with this idea of structure and relationships between things. Physics is one. Math is another. I think that all of these are just these incredible cupboards we can raid of ideas that most of which were invented before, like the modern software era. One way to move forward is to really use the hundreds and hundreds of years of work that have already been done to give ourselves hints about…

We effectively have an alien civilization that we can raid, and that’s our own work before the ’60s when structured programming just became a thing. I think to go forward, there’s always going to be new discoveries to be made. But one very, very fruitful thing to do is to turn around, look back, and find these things and say, “Well, is there an interface I could discover that someone’s already found that would let me just plug into this incredible almost battery of human creativity that just exists waiting for the taking in maybe dusty old papers and books, but it’s there. No one’s hiding it.”

Can One Trick Keep Working Forever?

Adam Gordon Bell: We started with fast data versus big data. We hit abstract algebra and probabilistic data structures, but these were all just examples for Sam’s idea of finding another field that’s already solved your problem and pulling in those ideas. Sam is actually working on this right now in his latest side project. He’s looking for more of these portal interfaces into math.

Sam Ritchie: I’ve got a project that I’m about to… I spent three or four months in this before my current job, and I’m about to restart it, but it’s this reimplementation of a lot of the core reinforcement learning algorithms, but using totally hardcore functional programming style.

Adam Gordon Bell: It’s like you’re pulling the same trick or you’re attempting…

Sam Ritchie: Trying, yeah. See if I can be a one trick pony, but with my math trick.

Adam Gordon Bell: I don’t mean it in a dismissive way.

Sam Ritchie: No. I’m on purpose doing it to test this theory we’re talking about, about like if you’re a one trick pony but your trick is opening the portal, yeah, just keep doing that.

Adam Gordon Bell: Very cool, sir. Well, good luck surviving the…

Sam Ritchie: This is all assuming we survive. Yeah, you too, Adam. Good luck with this, man.

Adam Gordon Bell: That was the show. If you have an interesting story of a solution to a problem like Sam’s, let me know. It doesn’t have to involve math. Adam@corecursive.com or find me on Twitter or the website or wherever. If you liked this episode, like really enjoyed it, then tell your co-workers about it. I’ve been trying to improve the quality of the episodes and hopefully it shows. Thank you for listening.

Support CoRecursive

Hello,
I make CoRecursive because I love it when someone shares the details behind some project, some bug, or some incident with me.

No other podcast was telling stories quite like I wanted to hear.

Right now this is all done by just me and I love doing it, but it's also exhausting.

Recommending the show to others and contributing to this patreon are the biggest things you can do to help out.

Whatever you can do to help, I truly appreciate it!

Thanks! Adam Gordon Bell

Support The Podcast