Moving Markets: Containers Powering Wall Street

The Future of Containers in the Enterprise

February 10, 2016

New York

Moving Markets: Containers Powering Wall Street

So first I'm gonna go down the panel on everyone just briefly introduce themselves, your role at your company, and if you can for a second or two what your company does, not everyone understands the different types of Wall Street firms and what they do. >> All right Travell Perkins from Fidelity, we have a several financial businesses both institutional, brokerage, retail as well.

We manage 5 trillion in overall investments and I'm the VP of Web Architecture and also lead a team, a digital platforms team that supports applications. >> How long have you been at Fidelity? >> About four years. >> Okay. So lifer. >> Yeah I'm Jake Loveless, I'm the CEO of a company called Lucera we're an infrastructure as a service company for these guys, for Wall Street.

Essentially an AWS but for adults >> [LAUGH] >> See? I told you. >> 100% container native right, so always from company's inception always run kinda believed in that vision and took that vision to an illogical conclusion, I think.>> Thank you. Luigi. >> I'm Luigi Mercone, good to meet you guys.

I'm at Bank of New York Mellon. I've been there about a year now. I run innovation functions within engineering and architecture. The easiest way to describe what we do is the rest of Wall Street are the gamblers and we're the the casino. We are the world's largest custodial bank, we hold about 28 trillion dollars worth of assets, one third of the worlds wealth.

>> So on either side of a trade, there's got to be someone in the middle who's holding the bank. That's right. >> And you're our bank by the way, just so you know. >> All good. >> Hey everybody, my name is Skand Gupta, I'm an Engineering Manager at Bloomberg. For those of you who don't know what Bloomberg is I guess everybody—>> You can. >> [LAUGH] So we provide a bunch of financial data services and software for mostly financial services so anything from trading picks, to news, data, analytics, whatever you want to do you can do that on Bloomberg platform.

>> So to start we'll go back the other way, can you give us a little of background of what brought you to containers, either the application, or the use case that was something that— >> Sure. I work at somewhat like a startup team within Bloomberg. We deal with essentially communication data and providing services on top of that.

When I joined Bloomberg, that was about five years ago, I was the sixth engineer in the team. >> Mm-hm. >> Today we have 50 people. So the team has grown significantly. >> Innovation, meaning you don't have to do anything real, you get to just play? >> Yeah. Exactly. [LAUGH] So we reached a point, a year and a half ago where we realized that our engineering is not scaling.

We have a whole bunch of applications with different profiles, different load characteristics, that run at different times of the day, and then each sub-team within my group, we were all running our own static clusters. And we realized that it's not gonna scale. Developers were spending more time in ops, at least spending significant amount of their time in ops rather than actually developing code and stuff like that, so we decided we needed something more flexible that allows us to reduce our hardware footprint, as well as increase the efficiency of what we have.

So containers was kind of a hesitant choice towards that but things have— >> Are there ops people in the innovation team, or is it just dev? >> Yeah, yeah. There are ops people as well. >> Okay. So you started using containers specifically for one application center, just testing it out across.

>> Our goal was to not build our platform as a service. We didn't want to be the infrastructure team. Our goal was to start, build something that works for one application that we needed to re-architect and then the with the peripheral vision of building something that's for future, right? We didn't want to do another short term project.

So container was just to stand up an infrastructure for this application that is very critical for our business and then use that infrastructure to migrate other applications to it. So that was the overall goal. >> What stack did you use? >> Mostly back end is C++, for container stack, we are a Mesos shop essentially.

>> Luigi, how about you? The genesis story. >> I think officially as the old guy in the panel I can trot out things like containers doing it in Solaris in the 90s. >> Yeah, look at Cantrill>> [APPLAUSE] >> Both hands up. >> Which is if we were in 2002 people were just saying virtual machines, I was doing that IBM in 1969.

It doesn't stop the evolution. >> [LAUGH] >> It doesn't stop happening. By the way that is new now it's called Go. >> [LAUGH] >> Like from early roots in that point it was for performance and operational efficiency. Had some very early level academia like pieces, things coming out of academia in terms of getting filter driver stuff from the Linux perceptive in my lab as far back as 2004.

More recently, implemented cgroups and Linux native containers purely for competitive advantage. Being able to stack an enormous amount of tightly co-located high performance work load on a relatively small amount of hardware for an equity derivatives pricing solution. So that was like pure juice.

That was market value. Fast forward a bit further and you go to—that was at another bank, in my more recent tenure here, my sort of shocked dismay and joy to discover that a place that is otherwise sort of thought of as being sleepy from a technology perspective as BNY Mellon is has had a functional PaaS based upon some pretty nascent but like functional containerization since 2010. It's a core to their strategy depending on how you slice and dice work load whether it's number of cores or number of OS instances you know you can get statistics which you can skew it any way but the fairest representation is let's say about a third of the production workload is already in some containerized form, and we're on a trajectory wanting to have that 90 plus percent in the next couple of years.

>> So in both cases it sounds like, because it's Wall Street, there was just as much driver for operational efficiency continuously. >> Absolutely. >> As much as starting in just early DevOps. >> It's one of those weird things where infrastructure actually led the charge. You're the guy that has to stay up at night with the pager, would you rather worry about 500 boxes or 5000 boxes? It just made sense.

>> What else happened is a production problem right now. He wasn't able to get off of the pager long enough to sit in the panel. >> Why don't you tell us about how an IaaS for wall street is built? >> Me? >> Yeah, it's like me, who else? There isn't anyone else stupid enough to do that.

>> So that, what's the Churchill quote? Blood sweat, toil and tears? >> Yeah, pretty much, and whiskey it's a commonly shorted quote. >> [LAUGH] >> So before Lucera, I ran a high frequency trading group for about ten years and man, like you I get the 10K tattoo, e10K and thumper.

So we ran Solaris in the early days, obviously migrated to Linux. Yeah, we had the same problem. Our problem was we needed to deploy and scale infrastructure for situational reasons. So if you're a trader, if you're running a trading team, and certainly if you're running a trading team of any scale, everyday can be a little different.

You might need to be focused in Asia, you might need to be focused in London, you might need to be focused on interest rates, you might need to be focused on foreign exchange, and that static infrastructure just wasn't working. There was actually true, honest to god alpha in having the ability to reconfigure the infrastructure on demand.

So, we were doing that very early and we did it for a long long time, and then said man, wouldn't it be great if we did this at scale, and kind of did this as a community, and took it to its logical conclusion. >> Let's talk a little bit about underlying market dynamic, like when you have a jump event, whoever can stay in the market and make prices, the one who's taking all the money off the table.

>> Exactly, and so, a lot of times we tend to think of containers, I think a lot of times, we think of containers as Docker and as Mesos, to me those are the much higher level abstractions. When I think of a container I think of an OS instance that I can run without performance penalty, so completely secure and have the ability to re-size it and things like that.

So we don't have a Docker abstraction, our customers don't have a Docker abstraction. But they do have things like say, hey, I've got this foreign exchange matching engine. We've got 12 of these that run on us, and if some government— >> Defaults. >> or unpegs its currency or takes a vacation for a week without saying if they're gonna unpeg their currency.

China. Then those instances need to resize, and you're gonna have these moments and it's a very bursty business and yet you need some horizontal scalability, but you really need to be able to scale within a box because the latency hit of going across multiple boxes is gonna be too acute.

And that's just one of probably a thousand reasons. >> Why does Wall Street need to it's own IaaS? Why does it need a grown up one? >> I don't know if you've heard, we had this little financial crisis and so when you have low interest rates— >> Was that you? >> No, that was not me.

>> [LAUGH] >> That was not me. Def—I am thinking. >> It's the biceps. >> [CROSSTALK] >> So, basically when interest rates come low, interest rates how you kinda control the flow and lending of money, when those become low, you have low volatility because it doesn't really matter if you put your money in equities or bonds, or art, or— >> They all suck.

>> Beanie babies, yeah they all suck, so the money doesn't move very much and when money doesn't move, the street, kinda the industry doesn't make money, we make our money with volatility, as a function of volatility. So fast forward five years of the nuclear winter of low interest rates, talk about cost consolidation time. In the end everybody knows the only way to get cost savings is to implement—where it's beneficial, and it kinda makes sense—to implement shared services models, either within the bank, right having a platform, or in the front office cuz if you take your network, yours, yours and lay them on top of one another, they're all essentially the same.

>> So the good old days of every Wall Street firm having its own incredibly expensive datacenter. >> Yeah. >> Across the river has gone. >> And everybody has consolidated because of latency right? Everybody up here probably lives in the same half a dozen or so datacenters. >> [CROSSTALK] >> Except for you, you're a problem child. >> No, you're [unclear] I have a connection to you.

>> [LAUGH] >> I mean in terms of the front office stuff, I think those are driving factors. >> Thanks. Travell why don't you tell me about your slightly different perspective. >> Yeah. >> Basically let's go back to just like a year ago, a lot of you know me from the Node side of things and so Fidelity is part of Node Foundation, Node and Angular, driving a lot of innovation and high velocity product to market timelines, and things like that.

The business is excited, we're deploying a lot of applications, we're servicing application teams and we're running into issues like, a lot of people in the infrastructure side of the house and then we have an internal cloud, a lot of them focused on Docker and other solutions OpenStack, open source solutions like that, but for our platform that this full-stack JavaScript platform, I was like, You know, we kinda, we kinda abstract all the infrastructure stuff away, I am not even sure we need this Docker thing, I don't quite buy it, the VMs are fine, we have resource isolation, no problems, okay.

On a production basis I wanna talk to a business stakeholder, I wanna say, your infrastructure is guaranteed for your application. I don't wanna have to explain why some B-rated application brought down a AAA-rated application, thanks but no thanks. >> But then you got, something happened.

>> But then we got bigger and then it wasn't the one or two shiny gems anymore like Trade Armor, which let you trade by the chart, one of our first Node applications you can do that's on Fidelity right now, you can use it. It went from the shiny gems to how do we support the whole portfolio and then we had request going into our support groups and it's like what about that app that asked for dev through QA like two weeks ago and keeping up with that, and so then we said well, we had a set of services like with automation, with Jenkins, how many people use Jenkins? [WOOOOH] How may love Jenkins? >> [LAUGH] >> Less hands.

>> That's why you said kind of? >> Jenkins and like whatever you're using for your secret store those types of problems. Jenkins is—basically the problem was, we needed to stop managing application-specific hardware and infrastructure and we needed to start managing clusters that we can deploy these applications on.

We wanna upskill our Ops people, we wanna be able to satisfy the need and so basically all of that existing Node environment is running on business as usual, operation infrastructure at Fidelity. And so like okay so how do we bring operations, how do we automate and operationalize the full stack JavaScript platform so that it's like one command line boom, one command line to actually generate your skeleton app, another command line to actually create a Dev environment another command just to push it and have it— Now you have something that's not only when running on a laptop but you are also running it in a managed environment for Dev all the way through to staging for prod, etc.

And so— >> That's your own private— >> So we have our own private cloud, we have a couple different flavors of clouds both open source and vendor. >> Why? >> We've been around a long time. [LAUGH] >> Okay. >> One of the first to actually have a web presence on the internet. >> Yeah. >> So the usual suspects we do use Xen, we do use OpenStack and then we also use commercial venders that you know by name.

But there's this migration and right now we've been dealing with this problem right now where we're looking towards the future. But in some ways when you look at Docker, there's definitely some open source solutions right now that make your legacy physical hardware strategy really attractive.

Because you can just slap some open source right on it and it's instantly Dockerized it's actually the best Dockerized environment. So in the whole richness of the Docker ecosystem, docker compose, for example we have this nice marriage between our node strategy and our Docker strategy. Basically we standardize on mustache a long time for just any type of template, doesn't matter whether it's for the angular side or whatever or on the data center side, everything must be mustached because you have to separate the presentation logic from the code.

We do interesting things like we have a metadata store using RethinkDB for just saves everything so, a lot of you guys are like you see consul and there is automatic magic that happens all the time and we're like no. You gotta stop and you gotta actually build the thing, you gotta tag it, you gotta version it, you gotta save it in metadata and you push it from environment to environment, you promote it, it's not that all that automagic type of thing so but yeah, we do a lot of stuff I'll get more into it later.

>>That's great. Thanks. >> [CROSSTALK] >>I think you hit one of the points and I know you— we don't as an industry we tend not to just have two environments, you tend to think like oh it's development and production or maybe it's development UAT and QA and production. It's like 40 layers >> [CROSSTALK] >> You've got low latency zones, you've got clearing zones, you got tier 1 zones, and tier 2 zones, and kind of how do you move >> performance environments >> And I don't know about you guys but every CIO I've worked for in the past five years, has had that moment, has conveyed that moment of like I just came back from the board and they asked me, that Knight Capital thing…could it happen here? You know, silence, pained silence.

>> And does that usually cost someone to run in the room screaming containers, fix this with containers? Or is that— >> it causes people to run screaming, yes. Run along to fix it, it's up to you the containers are. >> Have run into any hurdles at Bloomberg, any challenges with maturity of the ecosystem, the tool stack you've chosen, application fit, anything you can share that maybe would be helpful? >> The stack has been quite stable itself, so the stack that we use is Mesos, Marathon and things like that and that has been quiet stable.

Where we've run into problems is how much we have bitten up front. >> Right. >> So once we started building this infrastructure, we thought like, hey we can containerize everything. So sure those applications that you can containerize, then there's storage service we figure out how to deal with persistent services on this environment.

>> Mm-hm.>> And then they said, HTFS let's containerize that? Bad idea. Really bad idea, especially if you're not an expert on containers or HTFS. So we've wasted a lot of cycles on trying to fix some of the infrastructure issues around persistent services in HTFS especially, and I think we should have, in retrospect I would go back and say that let's take smaller steps at a time, and build something that we know are forced into a container environment and then add some of these critical services on top, so we ended up actually separating our HTFS cluster out of our main source and we'll go back at some point and bring it back in.

But the key thing being like think about how much you wanna chew upfront. >> How much change can you observe at once? >> Exactly, it just takes, otherwise once you get into these issues and bugs and stuff like that, you just end up missing a whole bunch of deadlines, and- >> Got you.

>> Wall Street has probably least tolerance for missing deadlines. >> [LAUGH] If we go over this session right now, we'll probably all be fired. >> [LAUGH] >> Yes. Did you learn any lessons? >> You've been there a year? >> At Bank New York Mellon for a year and at the rodeo for a lot longer than that.

This is reflecting on the journey, since 2010 having a functional PaaS elevated that was, and I say this affectionately, in retrospect, that was a scrappy effort with a bunch of vendor product and a lot of the open source that you'd consider common place now didn't even exist yet, it's functional.

And the lesson learned, it's kinda like I wanted to actually interrupt you and say like this is the question I want you to ask me which is, why does the front office care? And it is purely developer productivity cause like having elevated a functional PaaS that can handle certain classes of workloads means, within the little squirrel cage that the developer's running in, he could run at a certain speed, but it locks up every time it has a touch point on the rest of the ecosystem.

So you need a URL, it locks up, you need privileged access it locks up, because the rest of the ecosystem outside of that is still legacy. It's still we have an amalgam like that, and we've been looking. So those have been well established sticking points. What happens if it takes you six months to get something if it turns into six weeks, you celebrate and the next day you suck, I want it faster and then it turns into a six days, you suck I want it faster.

So we've gotten a bunch of that sort of operational latency out of the way, but it's still nowhere near where the bar has now been set. And looking— >> You move too fast for your own good. >> Well, it's the right class of problem to have. >> Okay. >> It's the right kind of problem to have.

And that makes you stop and ask, well you know should we have privileged access at all? Is there a model that, hope I don't offend anyone with the metaphor but like cattle versus pets. Great we've got a PaaS and a lot of it is automated but you still produce these lovingly cared for Tamagotchi pets of you know the OS instances and app instances that developers maintain. So you gotta a swath the productivity out of it but you still really haven't got to the root cause. So how do you look for something that's transformational, that changes the developer mindset model in terms of what they are enabled with like if it needs a patch shoot it and redeploy it, if it needs an upgrade shoot it and redeploy.

You got a lot of pain around privileged access don't allow it. [LAUGH] Doctor it hurts when I do that. >> Playing with stuff with scale get's fun. The secrets problem is a big problem. The other one is how do you do fail over and how do you help customers figure out fail over and. Even with the deployment abstraction being an operating system.

You're still talking about fairly tightly coupled deployments. I mean it's really hard to have a fix engine which is interacting with exchanges fail over orthogonally to login or something like. But we were sitting backstage and I was thinking one of my favorite use cases, so a customer who's really paranoid he kinda should be because security is an issue, so what they do is trying to enforce this idea of configuration management which is like motherhood and apple pie like everybody should do that.

But getting that through an organisation of that size is very difficult so what they do is they actually deploy on a zone and everyday at five o'clock they roll back to the golden copy. So if development or somebody gets in there makes a configuration change it's like and you haven't gone through the process it's like well worst case scenario, we are gonna get wiped out.

So you talk about that Knight story, Knight's like the bogeyman, Knight, for for those of you who don't know is $480 million in an hour. That company essentially detonated in an hour. >> It was under 30 minutes, like someone had made a minor code change and reactivated, accidentally reactivated an algorithm that had been commented out market opens and they basically exploded, almost half a billion dollars.

It's a little different than the website is like, oh I left the tag out, now the image is off to the right.>> [LAUGH] >> It's a little different, it's like a little different scale of problem there. It's like I have reactivated this code that I'm not even sure what was here and now we are all out of business so it's like it's losing money at a rate that only a machine can.

And of course you've spent hundreds of millions of dollars making them go as fast as they possibly can. That's right so it's like good luck stopping that trend. But no, I think it's just to make people see different ways of doing things. It's a competitive industry, everyday is a different day.

>> We are raising your confidence that the financial system is stable >> [LAUGH] >> There's something that you called out that is actually interesting to me being kind of an infrastructure guy who's climbed this way up the stack until they're like being clawed, but it's, this paradigm gives, creates a different service contract because the rant I went on before like if something is wrong we would shoot it, stealing the page from another major bank on the street that we want to emulate, we want to get towards is your underlying OS instance gets re-imaged every 60 days, you don't have a choice.

And that's just part of the service contract, like if you come and consume of this you're gonna get all these goodies and the folks who after the Morlocks—you know the Elois get to play and the Morlocks down here have to worry about sweating this stuff. That's one of the things that they can get back cuz you can shoot those instances, and it doesn't make a difference.

>> Yeah. >> Yeah. >> Good point. I was gonna ask. >> We kinda rebuild them the way—like OSes and servers really less of an issue. I look at like Ubuntu, for example, as just a runtime dependency almost. Like at this point, everything is pretty much network endpoint, that's the philosophy that we take, everything is like you know homeostatic, it's defined, it's baked into the image and basically warts and all that's part of that release and really it's the job of QA performance testing, I can care less what's in it, all the ports are locked down, there's only one way to get into that operating environment for any particular set of micro services with Docker compose we can orchestrate services together and tie them together.

One thing we do is we do blue, green deployments so the excuse of not, you'll be able to test something and wire it up to, along the path of service dependencies, you have variability with versions and stuff like that, but we can test all that before we actually make something live. I would say for more mature compaines or larger companies with multiple products, you know you really have to, if you don't have a platform key to kind of rationalize these stuff, you need to invest in that core engineering and figure out what your platform looks like for you because just saying here's a Docker swarm cluster or here's a Triton cluster, sick Docker compose on it and have at it, that's gonna lead to bad things, like you need audit and things like that, and those are types of basic security things, you know split key encryption, rolling keys, those are things that are never going to make it into the core docker platform or at least not soon, so there's bread and butter stuff that we need to take care of.

>> So bread and butter platform engineering— >> So you need to have that separation of concerns because then your business is gonna get frustrated if basically you have every single application team solving the same problem over and over again, and slowing down functional deliverable targets.

>> You won't withstand a regulatory audit. >> And that too. >> We get put out of business and then you all will be out of jobs. I mean it's kind of the nature again, it's like public cloud is so nice in the sense that it's like, oh everything can talk to everything, cherish that moment.

Because these are massive companies, that's not gonna happen. You can't make the assumption that you can talk to every database and you're like oh, I just ran consul, it's like guess what, I got 2,000 VLANs and I got 15 firewalls between most of them, consul ain't getting through those, so what are you gonna do? >> Thank you, I think we're out of time.

Thank you very much all of of you for joining us. >> [APPLAUSE] >> We really appreciate it. >> [APPLAUSE]


Skand Gupta: Tech Lead & Engineering Manager, Bloomberg

Travell Perkins: VP of Web Architecture, Fidelity

Luigi Mercone: Managing Director - Technology Solutions, BNY Mellon

Jake Loveless: CEO, Lucera


Dave Bartoletti: Principal Analyst, Forrester