The one with feature flags

Michael:

Hey. I'm Michael Dyrynda.

Jake:

And I'm Jake Bennett.

Michael:

And welcome to episode 158 of the North Meet South Meets Hopes and Dreams podcast.

Jake:

Meets meets hopes and dreams? Oh, I gotta hear about this. What are the hopes and dreams here?

Michael:

I hope and dream that we have something to talk about except for

Jake:

some We'll oh, we've got plenty. We've got plenty. Let me start by saying, this last week, I had the best mustache I've ever had in my life, and it was incredible. It lasted for almost a whole week. And today, I had to get rid of it.

Jake:

And, I was sad because, you know, here's the deal. Let me let me be real honest. I had to shave it this morning because I had, like, this mustache power thing going on. And I was, yesterday, I was undefeated. I could not lose yesterday.

Jake:

And I didn't want to go to work today and ruin the streak so I shaved it. I was like, I'm done. Let me tell you. Can I just tell you real quick? I hate I don't wanna, like, you know, break my mind patting myself on the back here, but I'm just going to tell you what happened.

Jake:

So on Friday, I went into work last week, had my mustache. By the way, here it is. I do a freedom mustache every year. So, like, for 4th July, I shave a must I do do a mustache. So I grow up my, like, facial hair for June, and then July, I did a I do a mustache.

Jake:

I got 8 guys to do it with me this year, which is so fun. So next year, maybe I'll start a Twitter a Twitter group and see who I can get to do it and just, you know, Freedom Stash for the win. It's gonna be great. Even Australians and Canadians, you're all invited to do the Freedom Stash. It's gonna be awesome.

Jake:

So, did my Freedom Stash, went to work on Friday, and we have this basketball hoop up on a door, like, 1 of those little Nerf basketball hoops, okay, in our banking department. And then there's a there's a, literally, a tape line on the floor. 1 of my buddies brought it in, and the original idea was we were each gonna put a dollar in for the week, and whoever got the shot first earns the money. But and then, you know, we ended up starting getting decent at it. And so, anyway Yeah.

Jake:

I've made it once a month. It has to be here's the deal. It has to be your first shot of the day. You can't warm up. No warm ups.

Jake:

Right. Right. 1 shot from the line. If you make it, you get to put your name on a sheet of paper that's on the door. Great.

Jake:

So it had been 2 months since I'd made this shot. I could not get it. And, and so anyway, I went on on Friday. Yep. Yeah.

Jake:

Exactly. Yes. Little mini ball. Yep. So on Friday, I went in with the mustache, made it.

Jake:

Nailed it. So excited. Got my name put on the door. Okay. Then I go in Monday, still have the mustache.

Jake:

Make it. Make it. 1st shot. So 2 days in a row. That's never happened to me before.

Jake:

Well, then there's also another line that's, like, 3 feet back from the regular line that we call the money ball shot. Nobody's ever made it for a shot. So I had mind you, I just made the regular shot. So I back up. I'm like, hey.

Jake:

I'm feeling it. Let's go for it. I made the 2nd shot, the money ball shot. Yes. Unbelievable.

Jake:

So then I tell the developers man. And so if you win all 3 rounds, you get the crown for the day in our dev chat. So I told them. I called it. I was like, hey.

Jake:

By the way, I'm gonna win this today. And they were not going easy on me because they wanna kill you. They there's no way, you know it's like on the last round, everybody's against you. I won. I beat every I won all 3 against unbelievable.

Jake:

Unbelievable day yesterday. So I had to shave it and just to make sure that I didn't screw up the streak today. And, yeah. So that was that was the story of the stash this week. It was a good week.

Jake:

It was a good week.

Michael:

Well So Yeah. We I got 1 of those little hoops for my birthday this year from the in laws. And Mhmm. It's like it's got rubber, like, strip on the door. So it's, like, not supposed to kind of wobble around.

Michael:

But any time the ball touches any part of that thing, the whole thing just shakes and wobbles around. So if your shot is not dead in the center of the ring, it's not going in 9 times out of

Jake:

the ring. Right.

Michael:

Because it just, like, flings the ball out. And we've got, 2.4 meter ceilings here, which is, like, fairly standard.

Jake:

It's like 8 feet what is that? Let me see how long it's here.

Michael:

Yeah. Yeah.

Jake:

It's like 7 feet 7 feet 11 inches. Yeah. So it's like 7 feet 8, you know, 10 and a half inches. So all

Michael:

the way It's a it's fairly standard for, like, most homes built these days. 2.7, which would be, like, 9 feet, I guess, or 12. I think it's it's usually, like, 8 foot, 9 foot, and 12 foot ceilings or whatever. Other

Jake:

8 feet. Yeah. Almost 9 feet. Yeah. Almost 9 feet.

Jake:

It's 2 point 2.

Michael:

There's there's not much room to get arc

Jake:

Right. In the

Michael:

shot. Like, it's you have to shoot it very flat, and you have to be very precise with it. And so it's You want the secret?

Jake:

You want the secret? You know what it is?

Michael:

Don't be terrible like me.

Jake:

It's called it's called the cobra. It's the under cobra. And and Right. And it's a backspin with just like this. Yeah.

Jake:

That's that's the trick.

Michael:

And so

Jake:

you got it allows you to start low at, like, your waist and then you get the backspin, you get that shooter's roll and lay it right in there, man. After that, I have

Michael:

to give that a go. Because it's like Yep. It's easier for Eli because Eli will go right up to the ring and he's

Jake:

Exactly. Yeah.

Michael:

He's hip height. So he can actually get some arc on his shot. Mhmm. It's not good for me. So the amount of times that boy tells me up, you know, 20 to 4 or something like that, embarrassing.

Jake:

It's hilarious. That's so funny. Yeah.

Michael:

But he loves it.

Jake:

That's good. Yeah. That's good.

Michael:

Yeah. He's on a he's on school holidays at the moment. So we Okay. I I had him yesterday, the day we went to the shops. They've got, like, 1 of these giant chess chess boards.

Michael:

He's like, let's play chess.

Jake:

And like Do you like chess? He could play chess?

Michael:

No.

Jake:

Oh. 0, okay. Okay. He you're I thought like, he's a no 1 to play chess.

Michael:

Very vaguely the rules. Like, I know that the pawns can move like 1 space, and I know that the the knight can go, like, in an l shape and

Jake:

Mhmm.

Michael:

Queen or, like, you know. So it's it's very loose. Like, there's no rule. And he's 6. So it's like, just let him win, because you don't want

Jake:

him instantly

Michael:

in in the mall. So that was a bit of fun, because he just

Jake:

That is fun.

Michael:

I had to I had to stop taking all of his pawns eventually, because he just I'm, like, are you sure you wanna put your piece there? Like, what what shape can, like, the night move in? He goes, oh, yeah. I probably shouldn't I

Jake:

so Couldn't

Michael:

be called. But in the end, he just decided to take just, like, take his bishop or whatever and walk over and just take my king. I'm like, alright. Well, that's been 20 minutes. It's long enough, and there haven't been any tears, so let's get happy before something happens.

Jake:

Nice. That's awesome. Yeah. I I do wanna look it up,

Michael:

because, like, he enjoys it. So I'd like to at least have a knowledge of, like, which pieces can move where. Like, a fairly accurate knowledge of that. And then the rest we can just make up this I

Jake:

think that, like, chess.com has,

Michael:

you

Jake:

know, some tutorials and stuff. But, yeah, it's it's pretty simple. Rooks moves in straight lines, bishops, rooting, rooting diagonals, the knights move in the in the l shape, then you've got the queen can move anywhere, the king can, move 1 space anywhere, and then the pawns can move 2 spaces forward on their first move and only 1 space forward on any other move, but they can only attack diagonal. They can't attack directly in front of themselves.

Michael:

Right.

Jake:

So, like yeah. Yep. That's every up. That's all

Michael:

of them. That's all

Jake:

the moves.

Michael:

Check it out.

Jake:

Yep. Look up fool's mate. That's a good 1 too to know. So if you don't if you're playing with somebody who doesn't know how to play chess, you can checkmate them in about 3 or 4 moves, which makes you feel really fun. It's it's really good.

Jake:

It's a good move. Yeah. Yeah. So, anyway, okay. So shall we talk about the stuff?

Jake:

So I think maybe 1 thing to talk about real quick is, that Tim McDonald put out a new Laravel pennant feature that dropped this week. I've been waiting. It's been so long. Have you? Okay.

Jake:

So so let's talk real quick about feature flags and the idea behind them and what, things it affords you as a development team, And then let's talk about Pennant. So and then let's talk about the new feature as well. So let's talk real quick about, like, feature flags. So the typical way that you roll out features without feature flags is you just say, okay. We've been working on this feature for a week, and we're pretty confident it's gonna work.

Jake:

And we've tested it locally, and we've tested it in our CI environment, and now we're gonna ship it to production. And so you ship it to production and you watch your air tracking service to make sure that know, nothing breaks in production. And, hopefully, you've got, like, a group of of people that you typically say, like, hey, Scott. Can you go try this out? Because, you know, you're a pretty good tester and, you know, just just try it try it out and see if you could do anything with it and they break it, whatever.

Jake:

And then you're like, oh, crap. It it is it is broken. I didn't think about that. And then you have to, like, roll back and then you have to go fix it and then you'd kind of do it again. You a couple couple iterations of this and and, there you have it.

Jake:

The problem with that is, you know, ideally, you don't have a breakage. And most of the time, you probably don't. But there's always this sort of, like, lurking fear in the back of your mind like, oh, I just I don't think it's gonna mess up, but I I always, like, wanna ship it on, like, a Monday night, like, after everybody's gone so I can be the first 1 to test it sort of deal. Mhmm. Well, feature flags removes that barrier.

Jake:

For me, it does at least. Because what it allows you to do is it allows you to ship a feature to production without exposing it to anybody except for yourself or a group of beta testers or whoever else you want to bring in on that 1. And so, you basically surround your feature with, in Laravel, you actually get a directive with this pennant, package. And in the directive, you can say at feature, and then you can either pass a string in there, if you name the feature with a string or there's a class based resolution that you can, use. And so you pass in the class there at feature and then the name of the class.

Jake:

And if that feature is enabled for that particular user over that particular scope, it will show it. And if it does, if it is not enabled, then it won't. And so you can use the blade directive. And then there's also, you know, a facade that you can do the same thing, feature, enabled or, whatever it is inside of your controllers or inside of your actions or your jobs or whatever it is. And so you can kind of bifurcate your logic and decide what you want to do based on who has the feature and who doesn't.

Jake:

That's the great part about it. And for me, it has helped me ship more often and with greater confidence than I ever did before. Because even if a feature isn't completely 100% bulletproof tested, I know it's exactly perfect, I can ship it and I can iterate on it after getting it into production. As long as I know it's not gonna, like, bork everything. I don't even have to, like, push it to everybody.

Jake:

I can say it's the end of the week. It's Thursday. I'm gonna push it out. We'll maybe do some tweaks next week, and we'll push it out again. And and there you go.

Jake:

But, man, it's insanely powerful. How long have you guys been using that sort of development flow for, Michael? It's it's been relatively new to me and I've been loving it.

Michael:

Yeah. We we've been using it sort of haphazardly, I guess, for the last Mhmm. 5 or 6 months, I reckon. Like and when I say haphazardly, we've we've kind of, like, gated the feature in the back end but forgotten to hide it on the front end.

Jake:

And so there's been, like, people that

Michael:

have clicked a link and then get gotten a, you know, a 403 or whatever because the feature's not actually active, but the the link to that feature was visible in the menu bar and things like that. So

Jake:

Mhmm.

Michael:

And this is, I guess, part of the problem with having a a separate front end is that you you don't have the blade directive. So when you're building the back end functionality, you have have to remember to, like, expose that through inertia or wherever else of your API and making sure that the front end is actually built in such a way that it considers the feature flags. Whereas when you're when you're using, you know, LiveWire, if you're using Blade, it's it's more evident because, you know, you are more likely to be building the front end than if you have a separate back and front end team working on it. So, yeah, it's been it's been nice. But when Pennant came out, which was what Laracon US Laracon.

Jake:

Yeah. Yeah. Right around no. No. You know what?

Jake:

Actually, I think it was talked about in Laracon India. Was it earlier than that? I think they were talking about it then, but it wasn't released then. I think it, like, sort of released with Laravel 10. I don't know.

Jake:

I don't remember. It's a good question. I'm not sure.

Michael:

So yeah. When when it came out, you know, we we started playing around with it. And I I remember using it for the Laracon AU website last year where I wanted to have a feature, like, to use the feature flags as, like, a time bomb kind of thing that, like, show this thing after this date. But the way that the feature flags were resolved by Pennant at the time was it would resolve it once, and then it would associate it with the user. Right?

Michael:

And because the Laracon AU website is more of a, like, a visitor, a guest user kind of thing, there's no user to associate it with. So when it resolved that feature, you know, it was it was before that date, for example, we wanted to show the schedule after a certain date because it had already resolved it. It doesn't resolve it again. Correct. And so, you know, we worked around that, somehow.

Michael:

I don't know.

Jake:

Basically, it's like you can't use it at that point. Yeah. It's like Yeah.

Michael:

Yep. So I think

Jake:

I think use it in that way necessarily. Yeah.

Michael:

Right. I think the way that we worked around it in the end was to use, like, the the explicit array storage so that every

Jake:

request Okay. Every request would do it. Right.

Michael:

There you go. So with with this feature that that Tim shipped this week, he messaged me. And he's like, I finally did it.

Jake:

Oh, he messaged oh, okay. Nice.

Michael:

Was to in introduce this before hook in a similar way to, like, Laravel policies work where you can

Jake:

Yeah.

Michael:

Provide a escape hatch, I suppose, to to do a test that, like, if you have an admin user logged in, they default have permission to do everything in the application. So this before hook in pennant works in a similar way where you could put these kind of things in there and then get it to trigger based on an environment variable or get it to based on, you know, a time switch or something like that. And that way, it would always check that before then resolving it from whatever the normal case would be, you know, to use a lottery or to resolve it from the database or whatever else. So

Jake:

Yeah.

Michael:

That's gonna be really useful in a lot of situations like that where you want to, you know, either globally enable or disable a feature without impacting on the already resolved values in the database, because it'll skip doing that those checks.

Jake:

Which is is huge. So let me kind of go back just a a quick step to talk about this. So each, each feature, if you're using class based resolution, which is what we do, when that feature isn't is first encountered in your code base, what it will do is it will previous to this, it would attempt to resolve that feature. And so the resolve method is called on that feature, and it it gets passed a scope. Now the default scope that the feature gets is the logged in user.

Jake:

That's what passes in to that scope by default. Now you can change that if you want to. If you're using the feature facade, you can pass your own scope. So the scope could be, hey, I don't want the user. I want the user's team.

Jake:

And then the resolve function now gets the team, or you can pass a string. So you could say scope is, and what we've done sometimes is we say scope is global, just a string called global. And then what it does is, when it encounters that flag, it will pass that scope in and then it stores the results of that resolve method, either truthy or falsey, true or false, inside of your preferred storage, driver. So in our case, we use the database. And so if, if, if it looks when it's going to resolve that and sees that that scope, that particular scope has already been resolved, it will not resolve it again, which is why that global scope is really nice.

Jake:

You just resolve it once and then it's resolved for everybody. However, what Michael said is is true, which is once it's resolved, once it's never resolved again. So if you wanted to do something where you said, okay. Now I want to enable it for everybody, what you'd have to do is you'd have to wipe out all the values for everybody in the database, which is annoying because you have to do that in in production somehow. You may have to go kill all those database records, or and or you have to, you know, modify the code, whatever.

Jake:

So with this check now and I think there's 1 thing to point out here, Michael, that is maybe, I don't know if this is strictly this, but what he said is this is the before is performing always in memory checks before resolving a feature's stored value. They're always in memory. I'm not sure what exactly he means by that.

Michael:

Like, you would always do it as opposed to, you know, not, like, skipping the check if there is a a value in your drive in your storage driver. Yeah.

Jake:

Yep. Yep. That makes that makes sense. Yep. And so in the case that you want to always resolve this particular thing, then it will do that.

Jake:

That before check will always do it, and you don't have to return a value. Right? You can just not return a value. So instead of, yep. You can just return all or or not return anything.

Jake:

Just return void. In the case that it returns void, it just won't it won't do anything. It'll just fall through to the resolve method. But if you do return a truth or a truthy value from that, it will, you know, either enable or disable that feature for for that that thing. So, the time bombs thing worked like you said there.

Jake:

And then the other thing that's interesting that we talked about, that we talked about too is, like, if you had, you know, so so the way he calls that is not a time bomb, but, like, basically a rollout schedule. Right? Yeah. Schedule a feature rollout based to be on, like, this date for everybody. So we have a beta group that's gonna get it, but then, 2 weeks from now, everybody should get it unless we need to specify otherwise, which I think is also really great.

Jake:

So Yeah. Really cool addition there and much needed addition. I think that's, that's that's pretty awesome.

Michael:

So Also, there's ability to whether you do this in the database or if you do it in an environment variable, you know, being able to globally toggle a feature. You know, you might have rolled something out, and it's, you know, some issue. Like, maybe it's connecting with the 3rd party, and the 3rd party has got some downtime or whatever rather than persistently hitting that and then degrading the experience of your users, you can just turn it off using, you know, either a database flag or an environment variable flag and say, you know, this thing this thing is now off. So that's where I would

Jake:

That's a great point. The before. Yeah.

Michael:

Yeah. In the before, I'd be like, no. This this feature, yes, it's a feature in the application, but we're globally turning it off for whatever reason. You know? Abuse.

Jake:

Love that. Yeah. That's a great idea. And I know that makes it really easy to manage. Yes.

Jake:

Because then you can resolve it basically for a scope or globally. Right? You could do both. You don't have to choose 1 or the other. You can you can do both.

Jake:

Nice. Yeah. Yeah. That's a really powerful feature. I'm liking

Michael:

this a lot. And that's a good shortcut, you know, so that you don't have to then go and purge all of the already resolved records. Yes. Which means it doesn't impact on the the status of, you know, individual users might have something enabled. You don't have to remove it all and have to re resolve it.

Jake:

That is all for a huge problem.

Michael:

Right now, we're turning this off. And that way, you know, once you flick that global switch again, it will go back to whatever it was before. So if you had access before, you will continue to have access once that before hook, you know, whatever that is, is is restored to that original state as well. So Yep. Yeah.

Michael:

I'm very very excited for that feature.

Jake:

Yeah. We have had ones where we've we've turned it on just for a specific group, like, manually. We went into the database and edited the value to just return true, and it was, you know, it was particular to those those users. And maybe that's not a great way to do it, but, like, we we needed to do it that way for a period of time. But then there was a situation where I needed to turn it off, and it was like, dang it.

Jake:

Well, I have to either just modify that code and re resolve it for everybody or just always, you know I don't know. It was it was goofy, so I didn't really have a good way to do that. But this is exactly that way.

Michael:

Depending on where you are. If you're not you know, if you're in an organization that has documented policies around like, I know as part of ISO and as part of SOC 2, like, there's the whole someone has to approve the code. There's gotta be, you know, all of these steps that you have to go through, which can delay, you know, just versus just having a switch that you can just turn off. Yeah. And that can then you know, I think we spoke about this previously, where we can then put the feature flags in the hands of the business rather than the Correct.

Jake:

And the makers.

Michael:

Like, oh, well, you know, there's a compliance issue here or there's some other business reason that we wanna disable this feature, so they can just go into some interface and just turn it off.

Jake:

I love that. Alright. Yeah. I really like that.

Michael:

Big big big fan of this this edition.

Jake:

Huge. The the other thing that's, that I maybe we could talk about Pennant for just a minute here is that these features, if it's not returned from the the repo the before, typically, you are invoking some sort of storage mechanism that you're using to resolve these things out of. Right? So it's whether it's the database or whether it's, you know, Redis or some other some other deal, you're having to query or go grab something to figure out if it's enabled or not. So just like any other thing, if you're not eager loading those values, you end up making a query for each 1 of them.

Michael:

Mhmm.

Jake:

So that's 1 of the thing you're gonna want to probably do is if you have a page with a lot of different feature flags on it, it's a great idea to resolve those ahead of time, because if you don't, you go look at your Laravel debug bar, you're going to see, you know, 1 query per feature. And so that can rack up the number of queries you've got, pretty quickly. 1 thing that we did to help solve this problem is we really there's this there's this 1 page where we're we're doing a lot of rapid application development on it, and this is 1 of the only places that we are using the feature flagging pendant stuff. And there's there's other applications, but in this application, that's where we're using it the most. And so that features directory, the Laravel app features directory, what we'll do is we actually have a little command that scans through that list of classes that's in there and then says, resolve this for all users ahead of time.

Jake:

Like every single user in the database or so yeah. Every single user that we have in the database, resolve it for all of them. Now we've only got 100 of users. Right? So, like, it's in this in particular case, it's all internal users, and so we can resolve it for everybody ahead of time.

Jake:

And then what we do is we say, okay. Now, get feature double colon loaded. And what I think that does is it says basically grab the distinct features that are in the database, and give you the list of those. So what we do is we say bump that feature loaded list up against that list of classes that we just grabbed out of the file system. And any of the features that no longer exist, purge them from the database.

Jake:

So get rid of them. They're old. They're stale. And then we cache that list of features that I just that I just mentioned, and we eager load those features every time that that big page is loaded. So we basically, ahead of time, when we're doing our our pipeline, our continuous integration deployment pipeline, we're in advance loading all of them, caching all of them, and then asking them to be eager loaded on that page, which works really well.

Jake:

I was going to talk to Tim about that and see if there was, like, some way to to do that. If you tagged a feature with some tag or something like that and just could say eager load all the features with this tag, that would be really handy. An attribute. In act oh, even better. See?

Jake:

There we go. I think because having to eager load them manually, you kind of have to look through all of the things that are on that page and you just kind of have to know, oh, these are the ones that you're using. But it'd be nice to just be able to tag them and then say eager load all these. I'm sure I could roll something like that myself, you know, just throw it on there. And then Yeah.

Jake:

I mean, just think of this.

Michael:

Yeah. There's precedent in the framework now for using attributes for that kind of behavior, the the model observers and and things like that. So, yeah, a some some attribute that is well named by the Laravel team, no doubt, that that allows you to do that might be might be an approach to to go.

Jake:

And since I'm already doing this so, like, since I'm already in my CI doing this thing where I'm when I'm deploying, I'm looping through every feature and sort of precaching it. You know what I mean? I could just say an h 1. Like, I could have, like, a global list of loaded features, and then I could have here's the features with this tag and here's the features with this tag. And so if I said, you know, feature load for scope user, whatever it is, I can't remember exactly what it is, but then you list all the features.

Jake:

Load missing is what it is. Feature load missing, and then you pass in the list of features. It'll eager load them for you. So just 1 query instead of 15 or 20 or whatever it is, you know?

Michael:

Yeah.

Jake:

So interesting. Okay. Some homework for me. That could be that could be cool. But I really liked using it, and I feel like it has, done a lot of good for me and really excited about this new 1.

Jake:

So pretty cool.

Michael:

Someone someone will no doubt write some kind of filament plug in or nova plug in or something, you know, to to manage features in the application that you can just drop into your app and then off you go, which would be Yeah. Really handy as well.

Jake:

Okay. Go ahead.

Michael:

On the subject of ego loading, if we're ready to move on from from from panel

Jake:

I have 1 I have I do have 1 more thing to talk about features that but we can come back to it because it's not necessarily specific to features. So go ahead. Eco loading.

Michael:

So we we have this, like, 1 endpoint in our application that is responsible for calculating the state of an application.

Jake:

So there are multiple

Michael:

steps there are multiple steps in in the application.

Jake:

Right? Yes. Multiple steps. Yes.

Michael:

Each each step has a status, and the status could be, you know, complete or pristine, which means it hasn't been accessed yet.

Jake:

I like that. Oh, that is such a good name. I've called I've used fresh before. I hate fresh. Pristine is better.

Jake:

Way better.

Michael:

Yes. So each each step can exist in 1 of these 3 statuses. But sometimes you might move between a a wizarded flow. So the the steps are different or there's different requirements depending on what you're in. So anytime you open an application, we hit this endpoint that goes through and calculates the state that the application is in, where you're up to, how much you've completed, whatever.

Michael:

For as it turns out, 2 years, right, the this end point has existed. It has. This is a persistent bug that has been there for 2 years, but it presented sporadically. And it was not like a per application thing. It was just like a time of day thing.

Michael:

If, you know, the right number of people happen to open an application at the same time, like, weird things would happen. And instead of taking, like, a second or a few 100 milliseconds, it would take 30 seconds or 60 seconds or it would time out. And because of the sporadic presentation of this bug, it's been very difficult to track down. Because usually by the time you get a report of it, and you look at it, the issue's gone away.

Jake:

It's fixed. Yep. It's fixed. It's gone away.

Michael:

But it's always crept up, you know, every now and then, and to the point where no 1 was reporting it anymore, because it wasn't happening anymore. But yesterday so we've been doing these these, merging of tenants over the past few weeks.

Jake:

Yeah. Mhmm.

Michael:

And so we've we've done 6 out of 7 so far. So we we've basically Almost there. Considerably grown the size of this single tenant over the last few weeks. And and yesterday, fortunately, when I was not at work, there were all these reports of, like, all of these timeouts and these these endpoint that, you know, is typically fine, but sporadically not, was like getting to the point where every time it was opened, it would cause an error. And I got a message at, like, 8 o'clock last night, 9 o'clock last night from my boss, who's like, I have cracked it.

Michael:

Finally. He's like, please review this PR. And what it boiled down to was we had changed a bunch of stuff from doing some explicit queries to using eager loads or lazy eager load. So you can do, like, model arrow load. Right?

Michael:

Mhmm.

Jake:

Yes. Right.

Michael:

So we were saying, you know, instead of querying the database again, we'll, like, just say, go and get me these records. And it was some, you know, there was some nesting, so we were getting, like, a status dot step dot state or whatever it was. The problem was as we had merged the tenants together, we had gone from, like, a few 100 records of few 100 of these types of records in a tenant to tens of thousands of records in this tenant. And because this thing was, like, doing model arrow load whatever, it was loading. Not every single 1 related to the application.

Jake:

Oh, gosh.

Michael:

Every single record that existed in the database of that record type. Now when you've only got 100 of records, sure, that's going to present sporadically, isn't it? Because, you know, sometimes you load a couple of 100 records, and no 1 will be doing anything, and it'll be fine. And sometimes you'll have 3 or 4 people do it, and it's gonna go through this every single time. So the solution to this problem was to, like, explicitly add, like, where has on the

Jake:

oh, okay.

Michael:

The lazy load to say, like, load arrow statuses arrow where has application ID equal the application that was being loaded.

Jake:

You would think that relationship would just figure that out. Like, you would think it would just

Michael:

do that. But it but it was not so it was not like the parent relationship. It was not the application that we were learning from. It was from, like, the application

Jake:

A nested child sort of thing.

Michael:

The the wizard. Yeah. So it it had lost that linkage.

Jake:

Got it. Yeah.

Michael:

So, yeah, we were we were eager loading the records we thought we needed. It's just because they were not scoped to the application that was being requested, it was loading, you know, a a few weeks ago, a couple of 100 records, and then it was a 1000 records, and then it 5, 000 records until yesterday. You know, you have 7 or 8 people open tens of thousands of records at once, and it's, like, churning through each of those things to try and calculate state, the state. And it was like so effectively, it was recalculating the state for every single application every time 1 application

Jake:

was open. That's brutal.

Michael:

And and the re you know, we got to the point where this was happening so much that, like, the IOPS were through the roof on

Jake:

on the

Michael:

RDS instance, which then caused the CPU credits to go bye bye.

Jake:

To do the roof as well.

Michael:

Yep. Yeah. So, basically, we solved this 2 year old bug. Well, I say we. My boss solved this this 2 year old bug within, you know, 12 hours out of necessity because it took, you know, the database down, which by and large is very overprovisioned for our needs.

Michael:

It's just this 1 endpoint.

Jake:

You had to you had to get all those records in there in order to be able to reproduce it consistently enough to find out what the problem was. This is really

Michael:

kinda what came down to. The fact that yeah. If it was not for the fact that we merged the tenants together, this this issue might have

Jake:

been You probably would have never been. Yeah.

Michael:

Probably would have been. For 2, 3, 5 years, whatever. But because we combine, you know, thousands and thousands of applications over over, you know, several years, And it's like and I looked at I'm like, oh, 0, no. I I wrote that ego load 2 years

Jake:

ago now. Yeah. So there you go. It's easy to do. It really is easy to do.

Michael:

Production database back to 0.

Jake:

0. Back to 0. Well, the thing is, it actually was crashing it all the time. It's just, you know, I had a similar situation today where it was like we we had a worker, a background sort of Lambda thing that was just running jobs and crashed our main application, sir, like, our actual, like, system of record just literally pegged to the CPU, hung. Everybody had to get out.

Jake:

We're like, okay. What was that about? And so Rest.

Michael:

Go in and Blame redis.

Jake:

Restart it and, like, restart the application server. Like, what's going on okay? Alright. So I was like, pause pause our Lambda there, and it was like, okay. So everything came back up.

Jake:

We're like, okay. Let's let's start it. And then we started it and watching the CPU, like, pegged, like, dang it. Okay. Pause it again.

Jake:

Went back down to 60. Like, okay. Start it 1 more time. Grow, grow, grow, pegged. Like, okay, well, we know what it is.

Jake:

That's that's like, you know, so I just had to reduce the number of workers and then it it kind of settled down a little bit. But yeah. I mean, sometimes it's just like you don't know until, you know, I've never had that many jobs in there before. I I was like 10, 000 jobs backed up and it's like you just never know. I've never had to it's never been pushed that hard.

Jake:

And so Yeah. Whatever. I guess so I had to limit the number of workers.

Michael:

Yeah. You can plan for, like, some end state, but a lot of the time, you're so busy focused on, like, what can we deliver now to, you know Yes. Make the business get there. That it's like, sometimes, these issues at scale don't actually present until you're at scale. And Agreed.

Jake:

The and

Michael:

until you And until you get there

Jake:

you won't know. And you're just if you always were trying to prepare for that and always trying to, like, mitigate those risks before they became risks, you'd never ship. That's the problem. So it's like Yeah. You kinda just gotta get it out there.

Jake:

That's why future flags are pointing. And then secondly, like, black you know, black fire or whatever it is. Like, those those monitoring things too. To be like, where are the slow queries? Oh, okay.

Jake:

Here's the slow queries. I need to go fix those up.

Michael:

Yeah. And I mean, at the end of the day, it comes down to how will you respond to those issues and what's your, you know, workflow like in terms of of addressing it when it happens. Yes. And we've got Yes. You know, I've I've pooped Datadog in the past because it is like a very heavy, very complex piece of software that, you know, does a lot of stuff.

Michael:

But if not for, like, the traces in there and, like, the query monitoring, all of that stuff, we wouldn't have been able to kind of narrow down the exact query, which then helped us narrow down the exact line of code, which then allowed us to figure out, you know, oh, we were not scoping that query. So instead of taking, you know, 10, 15 seconds for this query to execute in isolation, you know, when no 1 else was using the system. It's now, like, consistently less than 1 second. So

Jake:

That's awesome. So yeah. It feels good. It's Feels good. Yeah.

Michael:

It's it's good. And, like, I wasn't involved in the fix. So that's nice. Like, believe it or not, like yes. I introduced it.

Michael:

And if I was at work, I probably would have gotten to the bottom bottom of it. But it's nice that we've grown as as a business and as a team to the point where, like, anyone can look at any of these things and just deal with it. So that's, you know, from a from a team and business growth perspective, it's it's really good.

Jake:

That's awesome. Yeah. It's It's nice to have somebody else on the team other than you that can fix the problem. You know? Mhmm.

Jake:

Yeah.

Michael:

My boss, shout out shout out to Sam. He he has changed his his, profile picture in Slack to fireman Sam, because he just seems to be putting out fires for everyone lately. So

Jake:

Fireman Sam. I love it. That's great. That's funny. Alright, dude.

Jake:

Let's wrap this 1 up. What do we got? 157? Is that what it was?

Michael:

158.

Jake:

158. Thanks, everybody, for tuning in, hanging out with us. Find the show notes for this episode at northmeadesouth.audio/158. Hit us up on Twitter atmichaeldurnda@jacobandnorthsouthaudio. And as always, if you like the show, write it up in your podcaster's choice.

Jake:

5 stars would be incredible, amazing, and awesome. Alright, folks. See you in 2 weeks. Peace.

Creators and Guests

Jake Bennett
Host
Jake Bennett
Christ follower, web dev designer @wilbergroup and @laravelphp fanboi. Co-host of @northsouthaudio and @laravelnews with @michaeldyrynda
Michael Dyrynda
Host
Michael Dyrynda
Dad. @laravelphp Artisan. @LaraconAU organiser. Co-host of @northsouthaudio, @laravelnews, @ripplesfm. Opinions are mine.
The one with feature flags
Broadcast by