Applications on Autopilot

The Future of Containers in the Enterprise

February 10, 2016

New York

Applications on Autopilot

Okay. Hi folks. My name is Tim Gross. I am a product manager at Joyent, that's my twitter handle because I make bad decisions in life. >> [LAUGH] >> So I was originally gonna ask at the beginning of this which portion of people had seen the demo that we did in the other session? Obviously, nobody ended up seeing that demo so I'm gonna get to doing that again and this time be on the he right WiFi so that it all works nicely.

The idea here was to kinda do this is a bit of a tutorial so, there are two repos listed here, the first one is for the second half and then first one is for that MySQL demo that we did earlier. And the slides will be in the workshop repo by the end of the day, they're not actually there now.

So if you wanted to pull down those two repos and you have your laptop with you, particularly towards the second half will be more of the follow along side. Given conference WiFi you might want to try to do docker compose build to get your components now if you wanted to follow along.

If you don't, then that's cool too. I will have this slide and these repos will come back up later, but it's github/TGross/triton-mysql and workshop-autopilot. The workshop-autopilot repo is built kind of as a tutorial so the master branch is actually incomplete and and we'll be actually walking through how we might complete that later and there's a branch container-summit-NYC2016 that has the full complete code.

So I was talking at the earlier thing about, sorry before I continue, who was at the earlier session? Okay and then who wasn't at the earliersection. Okay so most of you have seen this so I'll kinda go through it quickly, but so this is a pretty typical application architecture, right. We have MySQL with asynchronous replication, and then clients have to figure out where to go they have to read from the replica and then they are going to make writes to the primary and they'll of course read from the primary whenever they read their writes.

But this immediately opens up questions about topology, how does the primary tell the replicas where to start, how does the replica know where to find the primary and how does the client know where to find either its a primary or the replica and which is which. And then there are the ongoing operational concern like how do we how we do backups? How do we do fail over? And when we do fail over how do we tell the client that we failed over and what do we do? So we have all—I assume everyone in this room knows this is no longer the way to do things right, there's options run on configuration management which are pretty good but but the major issue there in this scenario is that we don't have a way to watch for our changes in the topology.

So you still need a human being in the loop to say oh we've changed something now go and do it. And then there's the database as a service which places the configuration largely outside your control which means that as you start to scale up and you need kinda of a better performance out of that, now you're gonna start spending a lot of money.

So the idea here is to push the responsibilities for self operation into the application that means start up shut down scaling discovery and recovery. And that frees you from the time that you're spending on things that aren't important to your business. And so at Joyent we built a tool called Containerbuddy, and we're gonna dive into that quite a bit in this session.

It acts as PID1 on the container, and then it performs a number of behaviors in parallel to the running of the application. So, let's look at how that looks at Triton and again if you missed this earlier, this is the repo that we're working with. So what we have here is Containerbuddy, is an open source application that we built, it's written in Go.

It runs as PID1 inside the container, so it essentially acts like a supervisor but only in as much as that it forks and waits for other applications and it reaps children. So it doesn't do restarts. It attaches to the standard out and standard in of the main application that you're intending to run then it pushes that back out to the Docker daemon or to whatever your Docker engine is.

In this case were running MySQL, and we're gonna be running the Percona server, and then the health check, onstart and onchange handlers will be forking out to a separate application which we've written for this case, which is just a couple of hundred lines of Python and we're calling that Triton MySQL and it's going to be doing things like bootstrapping replication, running backups via Percona xtrabackup, and doing health checks with MySQL client.

So let me, I want to actually run this. So this time it's going to work. So I wanna point out here that, so there's nothing up the sleeve here right? This is just docker compose running a docker image that we previously built, there's no scheduler here, we don't have Kuburnetes or Mesos in the play we just have Docker Compose.

I'm going to bring up a small. [BLANK_AUDIO] Should actually run that. So we've got a small monitoring application here that's watching for what's going on in this. And as containers come up we are going to see them show up in the top section here that's just running Docker PS this middle section here is watching what's going on in Consul and then below that we are tailing the Docker Logs coming from the different containers.

When a primary comes up, sorry when a mysql comes up it's going to check in with Consul and try to figure out where the primary is in this case because it is our first instance Consul is going to tell it there is no primary and so it will tell Consul, I'm the primary and we'll do that by making a lock in Consul.

And I will get into that in a second. That prevents another instance from coming up at the same time and fighting over who gets to be primary. When a replica comes up, it's going to, well let's get to what happens in a replica in a second. So at this point we've seen that our lock session has been, our lock has been made and it's making a backup and writing that where that back up is happening into Consul.

So what's happening there is that during it's onstart process the primary is all being mediated by Triton, that Python code is going to bootstrap the database and then it's gonna push the first snapshot to our object store, in this case we're using Triton's Manta. And then this writing as we saw here when the last back up and when bin log was in Consul so when a replica comes up It's going to pull that same information from Consul and it's going to then know to get that backup from Manta, and then it's gonna use Global Transaction Identifiers to sync up replications.

So let's actually do that now. So we should start to see these coming up pretty quickly. In the same way that the replicas are going to be using Consul to figure out where to find the primary, any of our clients will use Consul to find the primary tool, so we can write that logic into our application code to check in with Consul or whatever our discovery service is going to be, and then use that as the IP's that we're gonna be making our queries against. [BLANK_AUDIO] Hopefully this is coming up.

So we see that the containers are up, and we should start seeing in a few seconds here that we have one replica that's now passing, and we should see the other one coming up, and we're gonna see that move from critical to passing in just a moment. Right, so this is just checking in with Consul, so now each of these replicas is sending a heart beat to Consul saying it's healthy because it's running its health check.

So let's look a little bit into what that actually looks like. I wish that would fit on the screen. Does that work? Yeah, okay. So let's dive a little bit into like what's actually running in the container here. So these are our two processes. So you've got a Containerbuddy, and we've got Consul.

That's kind of our two containers, so I wanna point out that. So everything on these diagrams that you see, everything above the Containerbuddy line is in that container, so that's gonna be one of our MySQL containers. We haven't traded a separate container image for the primary and the replica.

That's really important to note, right. We have exactly one image, and we're gonna use it for all the primaries. So the first thing that happens is, okay, this is the dockerfile for that image that we're using. So we're starting using the Percona Base, we're using that because its been well tested with Percona xtrabackup.

So, what we have here is we're starting with what they provided. We're making sure that we have Python and making sure that we have Percona xtrabackup, we're doing just a couple of dependency management things, adding our Containerbuddy binary. If you look in the repo you're actually gonna see that section is a little bit different just because it doesn't fit on the screen here but we're getting in the repo, we are actually getting it from the tagged release on github, and then we're running a command, and that command is containerbuddy, but then after the containerbuddy we're saying, okay, run MySQL with the MySQL argument.

So the way that Containerbuddy operates is, it takes whatever its arguments are as the executable, and its arguments to run. So let's see what that looks like. So the first thing that is it's gonna do before it starts that it's gonna be the onstart handler. And in the case of this application we are using our Python binary, sorry our Python code

And during its start is when we define the behaviors. To ask Consul for where primary is, do the initialization of the database if that's what we need to do or to bootstrap our application if it is not the primary. So how does Containerbuddy know how to do that? So there's a configuration file that we're passing in to Containerbuddy and it looks something like this.

So a Containerbuddy configuration file starts with where to find Consul, an onstart behavior so this is going to be what its going to do at start of execution, a block of services to advertise, so we're advertising the MySQL Service and how to healthcheck it. So, this is really an important feature of Containerbuddy where all the healthcheck code for an application is packaged inside the container.

That means that we can push TTL heartbeats to Consul without having to tell Consul how to healthcheck the application. Which Consul is really great at doing things like an HTTP healthcheck, which is great if all the services that were ever written were in HTTP. But we're obviously running MySQL here and it's not written HTTP, and so we need to be able to have some kind of other healthcheck.

So the only way to do that from Consul will be to package a mysql sequel with our Consul, which isn't really a great option in terms of deployments. And then we have backends, and so the backend that we're looking for is the name of a service called MySQL primary, we're gonna poll that every ten seconds and then execute an onchange handler when we've seen that there's a change in whatever that MySQL primary is.

So that's what the configuration file looks like. I'm gonna step through each of these kind of a lifecycle. [BLANK_AUDIO] So once the onstart handler is completed, then it's going to start our mysqld. So if you go back here in our command this is where this command block comes in, so now it's gonna run mysqld and it's gonna run with these parameters and we're also using a configuration file of course with some more details in it.

[BLANK_AUDIO] The main Containerbuddy thread is now blocking. So this is a go program and so those threads are all green threads and they are called go routines but if you don't know go, that's the main thread that's fine. So it's gonna attach to standard out and standard error of the app, and what that means is that anything the app emits to standard out and standard error is going to be reflected in the docker logs.

If you type docker logs, you'll get that out. And that's actually what's happening for our application here. So this is just running docker logs against this container. In fact let's just so that everybody knows this isn't a trick, like we're just gonna do like docker logs, MySQL 1, so that's gonna be 8 million, yeah, right so that's all that this thing is doing it's just doing so that we can see it all in one screen.

Everything that the main application is sending out is gonna be re-transmitted to the docker logs, which is good if you're using docker log driver like the log stash drivers and things like that. It's also gonna return the exit code of the application. So when I said that it's not a supervisor, the main thing I mean by that is that it doesn't try to restart the application.

If the application dies, so does the container. So if the application dies, Containerbuddy will capture that exit code and repeat it out so that when you look at your, so if I do Docker PS here. [BLANK_AUDIO] I don't have any exited containers, but if we had an exited container we'd see instead of up five minutes we'd see exit and then whatever the exit code was, so we wanna make sure that we haven't changed the other tooling that we wanna use so whatever tooling we're actually using to run the containers, we can still use, we haven't actually altered that so docker compose is great but if you're using a scheduler this can work perfectly well with that. [BLANK_AUDIO] So the next thing that starts happening after the application runs is concurrently to the main thread of Containerbuddy, we're running healthchecks so at a polling interval it's going to execute a health check in this case it's just doing a select one against MySQL and then if that succeeds it's going to send a TTL heartbeat to Consul. So in this case we'll say the instance ID (whatever that instance ID is) at whatever my IP address is is going to be healthy for the next ten seconds and so if Consul gets asked about that it will say okay yes this is one of the items that's healthy for that service.

This model is why a sane networking environment is important, so what you have on Triton, what you can get with some configurations of Docker Swarm, what mantl is giving you, where a container has it's own IP address is vital because if a container doesn't have it's own IP address meaning that the IP address inside the container is the same one that you'll use to reach the container then this is going to be a nonsense thing to tell Consul right so we won't be able to report this stuff so that kind of networking model is very important to make this work.

[BLANK_AUDIO] Okay yeah so and then in the case of our triton-mysql healthcheck we are also doing a check to see if it's time to make a snapshot, so the primary when it's running will periodically snapshot back to Manta and the reason is so that we are running our backups so it will do that whenever the timer that we have given it has passed or whenever it's rotated it's bin log so that way when new replicas come up after that bin log rotation, we can actually get them to sync up.

[BLANK_AUDIO] So just in case it isn't clear this is all that, this is what the inside of the container looks like. This is our process tree, so we have Containerbuddy mysqld and the Python application, and the Python application might be forking out to things like Percona xtrabackup but we haven't made any changes to mysqld itself right so this is isn't like a special fork of MySQL that does Containerbuddy things.

We were able to take a completely legacy application and do this. All right so the next thing to worry about is onchange handlers. Let's actually demonstrate that before we get too far. So the kind of the hard thing to figure out right, or kind of the fun trick, if you have this sort of environment where you have an autopiloting environment is, let's kill the primary and we see what happens when we fail over, all right, so let's do docker stop my_mysql_1.

So immediately we see that the primary has been removed from Consul, and the reason for that is because we're capturing exit signals that reach the container in Containerbuddy, and then deregistering them from Consul when we get that. So when we tell the container to kill itself, Containerbuddy is saying, oh, let's go and make sure that we've told Consul that.

Now obviously if you've had an abnormal exit where you don't receive that signal, you won't be able to tell it but the TTL will eventually pass and then you'll be good. So that's been removed, but we have to wait for the lock to disappear. We've now seen the lock disappear. And until that lock has disappeared, neither of the replicas could take the role of the primary.

Because we don't wanna have a transient network failure cause flapping within the replica and they're all trying to get primary at once. Once that happens we're gonna see an—and actually, I wanna point out here that we do see that this MySQL, the last thing it told us is that it was deregistering before it died.

So let's actually close that up, and so what we see here is that one of the instances has obtained the lock, one of them is now the, one of them has now marked itself as the primary and it looks like mysql_3 became the primary, so mysql_2 if you scroll back up a little bit. Let's do a [BLANK_AUDIO] So we're gonna see here it was told that by its onchange handler to change its master, so it automatically failed over to the new master and if we go, so let's go back to our, so what that looks like is it's periodically polling, it's periodically polling Consul to ask whether there's an upstream change.

If it doesn't change then nothing happens. If it does change, so in this case, primary failed, it's gonna run that onchange handler. The onchange handler is gonna say, stop replication and than all the replicas will all tell Consul, I'm the primary, and so we're using Consul atomic lock store which is called sessions to make sure that we only have one that is allowed the primary.

So whoever loses that, is gonna have to poll until the new primary comes up. [BLANK_AUDIO] And this is what i was talking about when we told that container to stop, when that container receives sigterm, its gonna run a pre-stop. And then it's gonna run a pre-stop operation which in this case wasn't much of anything.

Then it will pass sigterm to the main application and we'll run a post opp in this case that was where we send the signal to Consul to say remove me from the list of primaries. [BLANK_AUDIO] So before we move on from that, does anybody have any questions about that? Yeah. Sure >> [INAUDIBLE] >> Yeah.

>> [INAUDIBLE_AUDIO] >> I'm a little deaf so you're gonna have to be really loud sorry. Where was it? I went right past it didn't I? [BLANK_AUDIO] This or the this one? Sure. >> [INAUDIBLE] >> Yeah, so the name of the service will be what it reports to Consul so it'll say, so in this case we're saying for service MySQL report this IP in this instance as one of the available nodes.

Let me actually. [BLANK_AUDIO] I'm gonna get that wrong, not bad. [BLANK_AUDIO] Actually I'm gonna bring up that Consul since it might be a little easier to kinda… [BLANK_AUDIO] Yeah, that's totally what I used, just wanted to… [BLANK_AUDIO] Okay so, and we only see one now but we had two earlier because we've stopped it but the mySQL replica is reporting its service name as mySQL and the primary is reporting its service name as mySQL primary.

What I didn't get into too much, so you see there our configuration here. This is fixed. This is a fixed configuration and we're not advertising a MySQL primary service as part of that onstart that we have there, there's a little trick in that Python MySQL that re-writes this and hops so that it sends the right signal.

>> [INAUDIBLE] >> Right, the discovery is automatic so it's pushing these heartbeats into Consul, and then the other containers are querying Consul for that data. So all we had to is tell it where to get Consul from and everything else bootstraps. Does that answer your question? I wanna make sure I… Okay I can't see anybody on the back, there's a light like right in my…Good, sure.

>> [INAUDIBLE] >> Sure so what we do is whenever the primary's bin log rotates, it writes a snapshot and the last bin log into the object store and then writes a key into Consul that says where that is so you'll have the latest snapshot plus whatever the bin log since then is, and so that's the only thing that you need.

There will be a gap at the end of that bin log and it has to catch up using the global transaction identifiers by making sure that the last bin log is in there, you can do that. And that's we're using a very specific replication set up here which is GGIDs. Okay, I wanna move on to next session here.

[BLANK_AUDIO] So, this is if you wanna try to follow along although I recognize that is a really big room and it might not be work out so well. This might be the way to do it. If you clone this repo, change into the directory and then check it out, check out—-o that branch is the completed all the code is done branch.

If you stay on the master you have the kinda partially done thing. I'm gonna probably jump because I'm a little shorter on time than I thought so. Sorry. [BLANK_AUDIO] Good. Okay cool. I'm sure the organizers are gonna have my slides somewhere, I just need to figure out where. Okay, so let's look at a very simple microservices application and look at how we might try to apply this pattern to it.

So, in this case we have nginx and we have two services. We have a customer service and a sales service. And so nginx is gonna be operating as a reverse proxy. When we hit a URL that starts with customers, it will be passed to the customer service, and if we hit sales that will go to the sales service.

Well, then that does not a microservices make, because that's just a multi tier app, right? That does not what we generally consider to be, and this is kind of the trick, that as far as I can tell, this is the only thing that turns a multi tier app into microservices, other than the size of the service.

So, what we're gonna do we're gonna do is we're gonna have a very bad data model. And our customer's application is gonna have to get some data from the sales application in order to complete it's job. And the sales application is gonna have to get some data from the customer's application in order to do its job.

So we have an internal thing that it's gonna need to do. So applying that same pattern, we're going to use Consul for service discovery, and each of the applications will send TTL heartbeats to Consul. Before I move on from that, what that allows us to do is that allows us to scale up the customers application separately form the sales application, and all nodes will be able to see all other nodes.

Right? So when we scale up extra nodes in sales, the customers app will be able to round robin between them without having to go through some kind of external proxy. It's not gonna have to go back through nginx, it's not gonna have to go back through an internal load balancer, it can just talk directly.

So we've eliminated network hops, which is awesome, and we've allowed our application to scale in a way that—each piece can scale independently with no bottlenecks. So this is what the directory tree, if you go to the repo this is what it looks like. So you've got the customers application, which just has a dockerfile, a Containerbuddy configuration file, very similar to the one that we showed, and the application itself.

So it's customers.js. These are two node applications just cuz why not Node, and it's got the package.json there. We have a docker compose file which is the only thing we're gonna be using to orchestrate this. We have got nginx which is gonna have a docker file, Containerbuddy, it's what it's actually gonna serve and the configuration file, and I will get to what that ctmpl file is in a second and then the sales application basically very similar to the customers application.

So this is what the inside of the customers dockerfile looks like, this is a—this is the complete version. If you go to the sales version you will see that it's not complete, we just have that, so we will touch that in a second. So we are gonna start with the Alpine Linux which is the really small containers which hopefully you can pull down on the conference WiFi, I don't know.

We are just gonna install Node and curl on that, we're gonna add our application files to it, and we're gonna pull down the latest Containerbuddy release, you'll see if you look in the repo that it's the release candidate for 1.0, 1.0 should out in the next couple of weeks, and then we're copying our application and configuration.

You will see that we passed an environment variable into it, Containerbuddy and then a file path and that will be file inside the container that it's gonna use for configuration. You could also pass that as a flag to Containerbuddy or as an environment variable in your docker compose but that's easy.

We're exposing the port that we wanna use we're gonna use 4,000 cuz this is a non-public facing thing and we're gonna use, and we're gonna say, okay, run Containerbuddy and have it run Node and the customers application. [BLANK_AUDIO] This is what our Containerbuddy configuration is gonna look like inside that container, so again we've got Consul at the top.

Our service that we are advertising is just gonna be the customers service, we say where to advertise it. For our healthcheck because we don't have MySQL we're not gonna do a select one, we have written any code for it, we can just use curl, right? If we know that we can curl this endpoint that it's healthy, like that's a totally suitable healthcheck for this application.

Again because it's completely user-defined and it ships with the container, the same development team that is developing the application will be able to define what the health check is and be able to define, this is what we're using to decide this application is healthy, which is a really powerful model.

And then it's backends are sales. So it needs to know where the sales are, and when the sales application changes it's gonna run pkill, and that's actually a typo there. But it's gonna run a sighup to the Node application. Now most Node applications don't actually watch for this. Like if you're running express out of the box, they don't do this.

And lots and lots of applications fail to do this. Fortunately it's very simple to do, and I think I have…yeah, cool. So this is in the customer.js file. If you look all the way at the bottom, you'll see there's a variable—this is the simplest way you can do this and it totally works.

So we just have a list of upstream hosts. We have some code that says, how do I get the upstreams from Consul? And then we have a signal handler in our application so when this process receives sighup, it's going to run get upstreams and fill that upstream hosts value. So whenever the onchange handler fires, it's gonna fire sighup into the Node application, sighup will then fire get upstreams and that will cause that list of upstream hosts to be used for the next request.

Now, how your application captures that, and things like that, is part of your application logic, in this case we're really just doing it the in the crudest possible fashion so okay. So that's kind of like how one of the Node applications is set up. So let's look at a little bit of code here, so again here we've got, is the blue? Okay.

So we have our customers application and that one is done. [BLANK_AUDIO]. So we have the dockerfile, the Containerbuddy file, customers.js and package.json. Actually I'm gonna do a…real quick here… So we did that for customers, but if we look at sales let's actually dive in to what we would need to do change that. So we have a docker file, we have the package.json and we have our application.

So if we look at our application real quick, yes and I'm one of those people. [BLANK_AUDIO] I told you I make poor life choices. So emacs is probably one of them. So, here is our application we have that code for getting the upstreams and we see it's just making a call to our Consul catalog.

And when it does it's gonna push those addresses on to that upstream host thing. I should point out, I've never developed Node. professionally. I did a lot of Python and Go stuff, so if this is horrible, don't throw too many sharp objects. This is our root route, this is the thing that when we make calls to this application this is what it will do, it's going to get some data from—sorry, it's gonna find out what the upstreams are, and then it's going to make a call to one of those hosts, and you can see here.

It's gonna make an HTTP get. It's gonna make an HTTP get to one of those hosts and then it's gonna do this callback with that data which is gonna make make a response that is like a json object with that data merged with its data. So this is a really, really crude application. And then we've got that signal handler that we had before, right so this application is done from the JavaScript perspective so we'll say we're happy with that.

[BLANK_AUDIO] All right. So let's look at the dockerfile, so the dockerfile right now has, we've already got it using Alpine Linux, we are already doing our Node.js installation, we're copying our Express.js dependency and doing that installation, and we are adding the application itself in there.

So this is all set to go except for the fact that we don't have Containerbuddy in it, right? So we need to actually get we need to get Containerbuddy running in there. So I'n gonna just steal it from customers. So you can see if I just kind of flip between these two, you're gonna see that there is not a lot of changes that we've made here, so really all we've done is we've pulled, I wish the screen was a little bit bigger.

So does that help at all. So what we are doing here we are pulling the zipped dependency—I'm sorry the zipped release down from github where it's been stored, and then we are adding the file and the environment to the application. So let's add this to the container. [BLANK_AUDIO] And we are gonna add our configuration and our—and then we just need to change our command so that we're now calling Containerbuddy.

Need a comma there, otherwise it's going to yell at us. Okay. So we haven't made much change to this application, and now it is, suddenly able to be a self-operating application which is pretty powerful. Okay, and then we need to make sure that we have a Containerbuddy configuration in there, so again I'm just gonna steal that from the customers application.

[BLANK_AUDIO] All right so, its very similar to what we had here except we're gonna replace this with sales, because we're gonna be advertising the sales service, and we want to, and we wanna watch the customers service for the backend. So we need to do that and as it turns out because I was trying to be contrary we're using different ports here.

Again in a sane network environment were you don't have to do the port mapping, you don't need to do this. I'm doing this on docker locally for this demo, so we're gonna do that. All right so that's in pretty good shape. So we have it so that the sales application can get the data from customers, and the customers can get the information from sales.

What we don't have though is is a way to tell nginx that something has changed so that nginx can properly route between them. So let's take a look at what we have in nginx. [BLANK_AUDIO] So right now if we look at our nginx configuration, a lot of times we break these configurations down into a lot of different files, we just packed it all into one cuz that's simpler for a demo. This is one worker process which is not like a production ready nginx, let's be honest. Okay, so we have upstream blocks for our two applications, customers and sales.

We know that this isn't going to work, so if we tried to deploy like this we would have IP address and we wouldn't know whether those were good IP addresses or not. We'd have to have some kind of configuration management agent running inside nginx, or not running in a container so that we can put stuff on the underlying host.

All of these they kind of bad options particularly because they require a lot of manual intervention. So we will get how to fix that in a sec. But then we're gonna have these two location blocks. And we won't have to change these because we're gonna use those upstream blocks. So one of these blocks will say per request for customers we are gonna pass it through to customers.

We are gonna rewrite so you remove the customers thing from the front just cuz that's how I like to do that. And the same with sales. So as long as we could solve this problem. Where these things wont work, this is a working nginx configuration. We need a way to get data into—We need a way to change the nginx configuration on the fly.

So rather than making you all wait for me to type this out badly, I'm just gonna go. [BLANK_AUDIO] Oh, you're killing me. [BLANK_AUDIO] So what I have here is an nginx Consul template file. [BLANK_AUDIO] So this is a pretty cool tool that we have. Let me show you where this is gonna get you.

So we're gonna get to how this works in a second. But let's look at the Containerbuddy configuration first. So for nginx what we are doing is we have one service that we are advertising, so that's nginx. And that will advertise to Consul and that will be great because we can use that to route requests to the application, and we have our two backends.

So we have sales and we have customers. Our onchange for that is to run an application called Consul template. This is from Hashicorp. And what it does is it takes a template, once it's run it grabs configuration data from Consul and that the data we are running via Containerbuddy will work for that .

It's gonna grab that data, it's going to render this template file, let me kind of highlight this weird little syntax they have here. So it's gonna render this template file out to this file. Right so that's the nginx configuration file in it's proper spot and then it's gonna execute this next thing to do.

So nginx dash -s reload for those of you who've never run nginx does a graceful reload of it's configuration, so that it doesn't drop any requests, but it reloads the configuration. So that's going to allow us to, when we have changes in the IPs for sales, and when we have changes to IPs for customers we're gonna be able to rewrite the nginx configuration on the fly with out any downtime.

You'll note that I'm also doing that at onstart, and the reason for that is so that we have safe—when we first come up we have some values in there and not just the template file. So the template syntax isn't too bad it is a Go template, which if you've never written Go, their templating language is a little, it's like oh we want to be handlebars, but not really so there's a little bit of kind if there is a service in Consul called customers, we're gonna write an upstream block for customers and then this little bit says for each item in that service customers we're gonna lay a line service.address or service then the IP address and then the port.

And then we're gonna do the same thing for sales, so nginx really doesn't like it if you give it an empty upstream block, it will crash or it won't just reload its configuration, but in this case it won't start either if we don't have any data, so in order to make sure that the nginx service isn't tied to—in order to make sure that we don't have to deploy things in a particular order, we're saying only write the location block if the service customers exists. So that's kinda interesting so if you scare away all your customers you'll start getting 404s just on nginx from that, which is actually probably the behavior that you want. [BLANK_AUDIO] So let's see how that looks, so then our dockerfile is—again we are using the Alpine Linux, we've got—we're installing nginx and some of the tooling that we need just to get the install going, like curl, we're installing Consul template which is how we're gonna do that nginx configuration rewrite.

We're installing Containerbuddy, we're pulling that from github, we're adding our Containerbuddy configuration file, we're adding the static content that we're gonna be serving, I'll show you that in a sec, and we're gonna add our virtual host configuration and our Containerbuddy template right? So there's not many changes to make here to turn a pretty straightforward nginx configuration into a Containerbuddy-ready configuration.

So. [BLANK_AUDIO] This demo app is pretty minimal, right? There's two tables, one at the top which is customers at the top and sales at the bottom, and we've got a piece of JavaScript that populates that data by filing two tables from the customers and sales. All right, so it's just gonna make a request back to nginx and nginx will route that to the appropriate service, the service will make the second request that it needs to make to the other microservice, knit the data back together and then send it up as a json response. Okay, so let's actually do that.

[BLANK_AUDIO] I'm feeling paranoid, so I'm just gonna run this locally.[BLANK_AUDIO] Given my previous demo. [BLANK_AUDIO] This is local why are you bothering? Okay. So we have a docker compose file which actually I haven't looked at that yet so let's look at that real quick. So our docker compose file is pretty small so each of the services has a section, in this case I'm using a build of that Docker file in a CI/CD environment of course you're actually gonna have an image name here it will be your tagged image that's been put out by CI/CD system.

I'm using a link here because bootstrapping that kind of relationship to Consul isn't something that I wanna do on my local machine, but obviously you're going to want to have an HA—like in a real production environment you're going to have an HA Consul set up and you'll need a way to get that initial name to it. On Triton there is a thing coming out soon called Triton Container Name Service that will help you out with that.

These are really small containers again just for the demo here so we have customer sales, we've got our Consul image which I think the previous speaker pointed out Consul loves to have a lot of open ports so we are going to do that and then we have our nginx container and the nginx container we are exposing port 80 so that we can read it.

Just to show that there are a couple of different ways to getting the Containerbuddy configuration going or the command going…so on sales and customers app we put the command into the dockerfile. In this case we are going to put it into the Docker compose file and that's going to be, I don't know that there is really a best practice around that yet.

I think that has a lot to do with how you are trying to deploy your applications, but in this case this works out fine so we are going to run Containerbuddy with this configuration file that's in the container and run this service, so we are going to run nginx with daemon off and what that does is that makes sure nginx doesn't double fork and so all standard out and standard error will go to the Docker logs.

[BLANK_AUDIO] So we are going to run that, it's building this locally and we should see see very quickly…all right so our containers are up, and we should be able to [BLANK_AUDIO] so here's our application so umm…right so what we have is the top table here is the customers service, it's pulling—it's marrying some data from sales with its customer data, and on the right hand side I've got a little thing there that says what container it got the sales data from, just so that we can see that it's making that cross call.

And then down at the bottom, we've got the sales application that's doing the same thing with the customer data. So let's actually scale this up. [BLANK_AUDIO] Let's get three of those. Should be pretty quick. [BLANK_AUDIO] Once Consul picks that up, let's actually check in with Consul here so, that's the wrong Consul.

[BLANK_AUDIO] So this is our Consul that's serving this group and so we see the different microservice applications, and we see that we now have three nodes of that customer service come up. Any more questions about the way that this is set up? So the big caveat is you have to have san networking right so like if you are going to do scaled up things you have to have sane networking where you don't have address translation going on.

If you have that like if you are just running, if you are not running an overlay network you are not going to have good results from it so. [BLANK_AUDIO] Were there any questions on that now? [BLANK_AUDIO] So in this case, so what we're seeing here is it's detecting that we've sent…because the application only sees its IP and port, and it doesn't know what that network translation has been, what we see here is that what the applications are telling Consul is, hey I'm at port 4000 which means that they are trying to make the cross request they are making request to put 4000 which they can't reach. >> [INAUDIBLE] >> When there— >> [INAUDIBLE] Right.

Did I conk out? >> What? >> Can you still hear me? >> I can hear you because I'm sitting like in the second row. >> I lost audio here. Hello. Okay better sorry. So you are saying like when one of these instances fails it will— >> That's right. >> Yeah. So if I were to do, real quick. [BLANK_AUDIO] >> [COUGH] [BLANK_AUDIO] >> What is this, what did we call this? [BLANK_AUDIO] So if I kill a node, which is gonna be impolite, it's going to.

In a few seconds here come up with…so now one of those is failing, so the other nodes will stop sending it requests and nginx will get the… >> Is that TTL set by service itself? Or? >> That's set up by the Containerbuddy configuration for each of the services, so as part of that… [BLANK_AUDIO] So, when we say the poll and the TTL right to that's how often to do the healthcheck, and what to tell Consul how long that healthcheck is gonna live. >> [INAUDIBLE] >> Sure that was kind of a multi-part question, so really what it comes down to is why use this over a model like Kubernetes is kind of like, I think the first question that you're… Okay, so I think part of the reason is, has anybody actually successfully stood up a Kubernetes cluster here? One person, okay cool.

It doesn't happen very often in production, and the reason is because it adds a lot of complexity to the overhead of what you're trying to build. One of the things we talk about a lot is… [BLANK_AUDIO] One of the things we've been talking about is, the works on my machine problem, like you're not gonna run Kubernetes on your laptop, like the whole pile of service discovery things that you have to do to do that, whereas with the thing where all the logic for how the application is supposed to behave is contained within the container, you make the team responsible for that application, responsible for it's operation as well and so it's not just a change from a technical stand point.

It's a change from like the way that we should be treating the application in terms of like the development processes. I think it's where the improvement comes in. With a lot of the models around schedulers, the complexity is very high and you basically need to have a separate team who is responsible for managing that thing, which you don't have in this case.

Your second question was really just about like the service catalog itself and I think that's kind of like that depends on your scope. For a small set of applications like this, like you could totally use your Consul thing. For other things you're gonna want send some kind of metrics out to some kind of whatever it is that you're gonna use to manage that, this is kind of agnostic to that I think.

[BLANK_AUDIO] >> Staying on the question of source catalogs does Containerbuddy support anything other than Consul? >> Yeah, so today it supports Consul and etcd. The architecture for it is kind of plugin based—I wouldn't say plugin that's not probably the right word, but it's pretty modular and so we designed it specifically to allow for additions like that. I think a etcd support was only added a couple of weeks ago. And so I should point out, Containerbuddy got started at Joyent, I wrote a lot of the original code, but more than half the code is Justen, sorry, I'm gonna call it Justen cuz he is in the room.

Justen Walker from is a member of the community, not a Joyent employee, and has contributed probably more code in it than I have at this point, which I think is awesome just in terms of like, making sure somebody actually wants and not just like some kind of a crazy Joyent idea.

So there's a couple of different places where there are some interesting places to do hook in behavior like that, the discovery service and there's some open issues in ourgithub repo about monitoring and how we're gonna start looking at that and so that's kind of an open design discussion now.

So actually, I'm giving you a sneak preview here. Because I'm actually building these out on Triton, which isn't quite available on the public cloud yet in case some of you don' know. But it will be soon so this is kind of cool, so I didn't mean to preview that but I'm accidentally doing it, so yeah this is now building those containers out on Triton, and we'll be starting them up and so we'll be seeing this come up.

Casey go ahead. [INAUDIBLE] Sure actually good point, so Justen is not using this on Triton, he's actually using it with Mesos, so mantl? >> [INAUDIBLE] >> Sure, sure. So that's a good point, right, it is intended to to be scheduler agnostic. In theory you could even use this on like as you're init system for a very small VM, the model is about saying pushing the control down the application.

The underlying substrate is pretty much, it should be neutral to that. So this is gonna take a little bit here. >> [INAUDIBLE] >> Oh, okay, sorry. >> So the Node.js code that you put in each of the services to get all of the nodes of the other service? >> Mm-hm.>> Looked an awful lot like reinventing DNS and load balancing, and if you get what I mean by that, like you could solve the problem with DNS, like I saw comments in there just saying pretty much like we're just gonna round robin for now, and this will get better later.

>> Sure. Sure. I mean— >> Do you have any thoughts on how to make that a little better, cleaner or less at least reinventing the wheel? >> Lots applications already have that stuff built in I mean, look at nginx already has the ability to load balancing kind of as part of it. So there are people who run, I don't know any personally but I know.

there are people who run Lua code in nginx as their application server, right? So the problem that you have with that is you have anyway, so this is an easy example where it'sÍunfortunately Node doesn't, well I don't know whether it is unfortunate or not, there's not a lot of libraries around doing this stuff already in Node I would say, but if you're running Java and you have to connect to a database, like you've got some kinda pool handling system for that already, right? So all this is doing is saying, find where that hook is in your application and make that something that can be updated live, so instead of just pulling from a static configuration file.

So the logic behind how that works is gonna depend on the application and frankly whatever the protocol is if you're trying to speak. If you want to maintain long persistent connections or whatever is on your backend, like that's gonna be apart of the application. The question like there was a kind of a partial other question in there which is like what about DNS, and DNS gets you pretty far but there's a couple of issues with that which is a lot of application frameworks fail to properly handle DNS and so you end up with things like, so my last gig we had a really big part of one of our big monolith applications was Django, and so if you give Django like here is where my data base is and then you move your data base, or you'll have to restart it or it goes to a different IP or you fail over.

You have to go and restart your application server. It doesn't have any of this logic built into it. So you always have that problem even with things like DNS. So this is a way of saying like let's make the announcement of these things or advertisement of these services more active, but you're still stuck with the problem of how does the application respond to that is still the responsibility of the application developer.

I think what this does is it makes that more explicit instead of something like oh we can just that's the operations thing we can kind get away with not doing that.You never really were able to get away with it did before, this definitely makes it explicit though. I think that's all our time so.

Okay, thank you. >> Thank you so much. Thank you, Tim for another great… >> [APPLAUSE]


Tim Gross: Product Manager, Joyent