Michael Henkel, Principal Engineer, Software, Juniper Networks

Juniper Networks Contrail Cross-Cluster Virtual Networking Demo

Telco Cloud

Screenshot features Michael Henkel as he gives his presentation in the bottom left corner and a slide from his presentation is shown to the right. IT is three boxes on a block screen showing three Kubernetes clusters.

Virtual Demo: Contrail Cross-Cluster Networking

If you are interested in learning more about Contrail Networking’s cross-cluster virtual networking, then this demo by Juniper’s Michael Henkel is for you.

You’ll learn

How Contrail and Kubernetes resources interact together
How it provides added value in terms of load balancing
How operators interact with Contrail as a system

Who is this for?

Network Professionals Business Leaders

Host

Michael Henkel

Principal Engineer, Software, Juniper Networks

Resources

Transcript

0:09 my name is michael henkel i'm working in the pathfinder team in the city organization and i'm really happy to have the

0:16 opportunity to show you some of the capabilities of contrail and as i mentioned just before my statement

0:22 um one of things i want to demonstrate today is how contrail and kubernetes resources

0:29 are interacting together and how we can provide some added value in terms of

0:37 late relay for functionality around load balancing and also

0:43 one purpose of the demonstration is going to show you how

0:48 operators are going to interact with contrail as a system and how natively that works by just

0:55 following the standard ways on how people are operating the kubernetes cluster what i'm going to demonstrate

1:02 today is load balancing basically so

1:07 i have three clusters set up cluster one two and three and they are

1:12 represented here in canon s windows so we have the first cluster second cluster down here and third

1:18 cluster and what we want to achieve is that on cluster one and cluster two we

1:24 are creating an application as part of a stateful set which is going to be front-ended by an incross controller on

1:29 both clusters and cluster 3 acts as some kind of a client cluster which is communicating with the applications on

1:36 cluster 1 and cluster 2. and the goal is to load balance that traffic so when cluster tree is trying to access the

1:42 application half of the traffic is going to plus the one the other half of the traffic is gonna go to um cluster two and um that

1:50 is gonna be achieved without using any kind of external load balancer um so that's going to demonstrate some of the

1:56 multi-cluster capabilities of contrail using a concept we call the control plane federation where we exchange

2:02 dynamically routes between all clusters and it's solving a common problem that

2:08 if you have multiple clusters they all provide interest how can i make this increase accessible

2:15 in a load balanced fashion to the outside world the the resource of the features we will

2:20 work with in contrail are virtual networks subnets route targets pgp routers you

2:27 don't need to memorize all of that it's just again the purpose here is to show you how we

2:32 interact with those resources and finally what we will end up with is a vcmp load balancing

2:40 from the source in cluster 3 towards the destinations in cluster 1 and cluster 2.

2:46 the way i've laid out this demonstration is that we will first just create some standard

2:53 kubernetes resources without any control specific configuration once we have all those resources set up then we will go

2:59 to the contrary resources and create virtual networks and subnets and we'll see how the contrary resource and the kubernetes

3:06 resources are going to interact with each other okay so let's start with the first piece which is to create the

3:13 application on custom one and cluster two so it's pretty standard a stateful set it's going to create two replicas

3:20 and the application is pretty simple it's just going to report back the name of the pot and the name of the cluster

3:28 which is injected by an environment variable down here that application is front-ended by a standard service so we

3:35 are going to create this one on the first cluster

3:41 so we can see here the first replica is coming up doing this on the second cluster as well

3:51 so this one is coming up down here it's going to take a couple of seconds to pull the container images

4:00 and at the same time you can see a service has been created which is called

4:06 my service which is simply listening on port 8080

4:16 so the first port is running second part is being created right now in the meanwhile what we can do is

4:22 to install the increase controller on cluster one and cluster two

4:28 um again standard installation just taken from the manual online nothing

4:34 specific over here so we are just installing it off the internet on the first cluster

4:42 so when we go to all namespaces we can see that the increase controllers and the admission

4:48 controllers are being created on the first cluster and now we are going to do the same

4:53 thing on the second cluster and again if there's any question in

4:58 between don't be shy and interrupt me at any time so what we then will see

5:04 in the services is that we created the standard service for the increase controller again i just took the

5:11 standard manifest for the inquest controller and once we have our increased pots running

5:18 double check here increase controller is coming up we can create an increase and what that

5:23 increase is doing is basically it's connecting the service with the um

5:30 the service you create for the application with the inquest controller so basically it makes all requests for this host

5:38 being forwarded to the application front-ended by the service over here

5:44 so let's do that for the first cluster

5:49 and let's do that also for the second cluster

5:55 so up to this point everything was standard kubernetes um there is no

6:00 contrail resource involved in here besides that the overlay network in

6:06 which the applications are running is provided by contrail itself

6:15 again everything is stock kubernetes we installed an application with a service we installed increase controller we

6:22 create an inquest and we will end up with that kind of deployment over here

6:28 what we are now going to do is we create a additional service network so we have

6:34 our default service network which is part of every default installation um like we do have pod networks again

6:40 part of every default installation when you install kubernetes you add control so cni

6:45 you will end up with a default service network and you will end up with a default port network we are going to add

6:52 a second service network this is going to be the service network through which

6:57 we will make the increase available to the outside you could do the same thing with the default service network but for now

7:04 we want to keep the service the default service network internal to the cluster and we want to use a separate service

7:09 network we can expose to the outside so what we will do here is that's the first time we are actually creating a contrary

7:16 resource in order to create a virtual network we need to do two things we need to create

7:22 a subnet and the subnet is pretty much straightforward it simply contains a cider which provides the subnet

7:29 information it can be we can put more configuration in here we can define

7:35 the gateway and a couple of more attributes of the subnet but for our purpose here it's just enough to provide

7:42 the slider gateway will be auto allocated it's starting with the first ip address in the subnet and the second part is to

7:49 create the actual virtual network they have to be the same name space and

7:55 we are referencing that subnet in the spec itself so if you can see here

8:02 the name and the namespace of that subnet reference is matching the

8:07 subnet name and namespace one thing to highlight on the virtual network is that

8:12 we are assigning this attribute here um a route target and that is very specific to pgp

8:20 just for now take it as an identifier of that virtual network it's just a

8:26 unique id which is being used within bgp to identify

8:31 if two virtual networks should be allowed to communicate with each other

8:37 the format looks a little bit awkward but that's how um the road targets are

8:42 defined in bgp again just take it as a unique identifier for that virtual network so let me create

8:48 that subnet and that virtual network on the first cluster

8:54 and let me do the same thing on the second cluster

9:01 and now we can go into our virtual networks

9:09 and you can see that we do have one virtual network which is called increasvion

9:15 and the same thing over here and we have it over here and it's in

9:21 success state which means we are ready to use this virtual network now what we will do is

9:28 we are basically moving the connection of the increase controller of

9:33 the default service network onto the new service network we just

9:39 created again that's the network we want to use for external reachability

9:44 and that is quite interesting because all we need to do is we need to change

9:50 the spec of the incros controller itself sorry the

9:56 spec of the service for the inquest controller itself so basically right now and we can demonstrate that

10:03 probably over here when you install the inquest controller

10:09 as a default you can see that it's a type noteport that's the service which is being

10:16 installed default what we are going to do is we are changing that type to load balancer and

10:23 usually when you change something to type load balancer there is an expectation that you are getting appear

10:28 trusses from external load balancer which could be your cloud load balancer in gcp or

10:35 aws or azure in our case contrail is going to provide that

10:40 appeatre so we do not need an external load balancer to do that so we are just going to patch the

10:47 service with that four lines of configuration

10:52 on the first one and now you can see right away the type changed from node port to load

11:00 balancer and we do have an external ip address and doing the same thing on the

11:06 second cluster now if we check

11:12 the services here what you will see is we have the same ip address on both clusters which is dot 2.

11:20 and this is a technique called anycast ip so we are going to advertise the same

11:26 ip address to all the clients which means the client in our case it's a

11:32 contract cluster which has pgp running is going to equally distribute or almost

11:38 equally distribute the traffic across both clusters holding that application

11:43 and that's because we are advertising the same ip address we also could advertise different ip addresses but i

11:50 just want to demonstrate that through bgp we support the anycast where

11:56 bgp is smart enough to detect hey behind that single ip address i do have

12:01 different destinations now what we have not done yet is to create the control plane federation

12:08 what we have in place right now is we have three clusters and all three clusters have a local bgp router

12:15 configured and we can see that if we go into the bgp routers over here

12:21 oh sorry that one already has it configured let's go to this one

12:27 so the cluster 3 has now bgp has only oh i pre-configured them already okay so

12:34 we can basically skip that step that's okay saves us some time so what you can see here on our first

12:42 cluster we do have uh our local cluster so that's the representation of our local bgp

12:48 node and we do have a representation of the third cluster as a bgp neighbor and

12:53 we do have a representation of the second cluster as a php controller

12:59 basically all we need to tell is the ip addresses and that's going to establish a

13:04 b2p connection between all three clusters so basically what we have in place is a fully federated control plane using

13:11 php where each cluster has the other cluster configured as a bgp

13:18 pier important to mention here those pgp routers you can see they are

13:23 config entries we are not running multiple bgp processors there is a single bgp process

13:30 per cluster and those guys here are just a representation of the php peers

13:38 likewise you cannot only add contrail clusters spgps you could add any pgp

13:45 device sap here so you can use a router francisco routers from jupiter

13:51 or bird or fr or whatever kind of php software you want to use

13:56 it's a fully compliant pgp implementation itself so because i have all the bgp so um

14:03 https already set up you can basically skip that step i'm just going to give you some introduction on how the

14:09 manifest look looks like um with all kubernetes resource it has a name and a

14:15 namespace and the important pieces here are the ip addresses which are used to establish

14:21 the peering and also the autonomous system number which is again going back to how pgp works you

14:28 can assign different autonomous system numbers in our case we are using the same as number for all

14:35 b2p routers and that basically means we are

14:41 running internal bgp versus external bgp if you would use different autonomous

14:46 system numbers again that's very heavy on the networking side it's not really important for that um

14:53 demonstration over here you you mentioned that you're you're only running one bgp process i assume you

14:58 mean that there's some sort of fail over there right like that there's oh yes absolutely so

15:05 there's just not one process per bg uh bgp router entry in right

15:12 absolutely okay we can run any number of bgp controllers in the cluster and they

15:17 are always fully synced um so they are not running in any kind of active passive mode

15:24 in fact all p2p instances are always active active but

15:30 as you said they are not on a peer basis all pgp

15:35 instances connect to all bgp peers as configured over here

15:42 and also each compute node would connect to multiple control nodes and the control nodes are active active

15:51 and they are paired by a pgp even if one control node goes down the other control node picks it up

15:58 hey michael not to go back too far but i keep getting caught up on this so the can you just explain again real quick in

16:04 the manifesto the route target list is that like a bgp prefix or like an acl or is that actually establishing neighbors

16:11 or what it what is that actually doing there so the the road target itself tells p2p

16:17 that virtual networks belong together in bgp so it makes bgp to exchange routes

16:25 between so like an asmr works like an asn or something like that kubernetes has the best analogy here a route

16:31 target's like a label that we applied to a prefix and then the the routing instances themselves

16:37 like the virtual networks have a will match on a set of labels a set of route targets in this case to decide which

16:44 routes to pull in from all of the routes that it's been advertised so we attach this route target label to this the same

16:50 route target label to two virtual networks and then bgp knows to blend them together into one virtual network

16:56 that helps a lot because i i don't know why i was so caught up on that it was just so thank you that yeah that actually

17:02 worked really well just using labels like application labels and things like that i can map everything together and

17:07 it knows to exchange realms and then there are some very interesting quirks on on road targets because

17:14 you cannot simply define a route target you can also define the direction so you can basically say

17:21 you can differentiate between import and export so you can say one virtual network is only importing a certain

17:27 route target and is exporting a different route tag and we will see that later on in prasad's

17:33 demonstration you can create interesting concepts with hubba and spoke where you basically can say you have one hub

17:39 virtual network which is allowed to talk with all the spokes but the spokes cannot talk to each other by using

17:45 asymmetric raw target export so it's a little bit more powerful than just a label itself

17:52 yep no the good stuff thank you okay so we have the bgp peering up and

17:57 running so rods can be exchanged but as we just have been on the topic of broad

18:03 targets just um to remind everybody what we wanted to do is to

18:08 allow communication from the pod network in cluster 3 to the service network in

18:14 cluster one and two now in order to do that we need to

18:19 set up the road targets because right now the pod network in cluster three hopefully does not have the road target

18:26 configured so let me quickly check that so we go to the default port network and

18:33 we can see here it's a little bit overload with information but there's no route target

18:39 on that default port network in cluster 3 that means right now bgp

18:45 is not establishing a communication path between the spot network and those two

18:51 service networks in order to exchange routing information what we can do is we are simply adding

18:58 the same route target to this port network over here and that again is very simple

19:04 we can just patch the existing default port network

19:11 and if we go back here and we check it again you will see now

19:18 we added the same route target to the pod network

19:23 and now bgp understands that there is a relationship between those three networks and pgp is

19:31 allowing to exchange routing information which means we do have a path from this spot network to the service

19:38 network in the first cluster and to the service network in the second cluster now because we're using an increase on

19:44 cluster one and cluster two it's not enough to work with ipa trusts because the increase is basically checking for the

19:51 fqdn you are using when you are doing the http request so what we need to do next in

19:58 the third cluster is to inject some dns resolution for this ip address

20:04 and remember that's the ip address we are advertising from cluster one and cluster two the same ip address being

20:10 advertised through bgp so by creating a separate namespace and creating a service in cluster tree

20:18 we are simply generating a synthetic dns entry which basically

20:24 resolves a certain url or a certain fqdn to this ip address we

20:31 are exporting in the two clusters so let's check that service here

20:38 and we have the service and that service is basically saying that myapp.apps

20:44 is resolving to 192.168.

20:50 and the last step of the demonstration is basically to create a pot on the third cluster which is going to send

20:57 some http requests to cluster one and cluster two and what we expect is that we see

21:05 responses coming back from the two parts on the two clusters by

21:11 accessing this url over here and this url again is front-ended by the

21:18 increase on both clusters so let's try that

21:26 and photos

21:32 and just a few words on um the url we are using here

21:38 so that one is defined in the increase if you remember or can quickly go back to the increase we

21:43 created so basically we are using a wildcard and say everything which goes to appsdemo.com forwarded to my service

21:51 and that's what we have on both clusters so the url we are accessing is exactly

21:57 that it's myapp.appstoredemo.com and

22:03 now let's try it out okay so you can see that and i'm going to

22:09 stop it here so we can take a little bit more detailed look into it the

22:14 pot hello world one one cluster two is replying on the first request the pot hello world

22:21 one on the first cluster is replying to the second request

22:26 zero on first cluster on the third request and then zero on first cluster to the fourth request that's why i said

22:33 it's almost equal distribution um you will see and that's based on multiple factors um on

22:40 on the entropy um how much different your source type p addresses are and we can

22:45 influence that kind of load balancing based on the tuple we see um consisting

22:51 of social peer trust destination ip address ports protocols we can

22:57 more equally distributed or less equally distributed based on certain requirements

23:03 but in general what you can see is that the requests are being served by all

23:09 four parts two parts on the first cluster and just to show that back here

23:16 hey michael so you can see yes i know i totally cut you off there apologies but but what we're looking at

23:22 here we are we're using any cast this looks like dns round robin so are we doing like dns ron robin here where

23:29 we're sending between clusters or are we leveraging any cast across the clusters to also distribute this

23:35 and then you also touched on like ecmp and things like that potentially so we are just using anycast um

23:43 what we do on the third cluster is we are resolving apps sorry myapp.apps

23:49 to a single ip address so the dns entry for this url does not

23:55 have multiple iphones so there is no dns round robin at all okay okay that is pure

24:02 anycast because the traffic goes out to this thing like address and bgp internally

24:08 is distributing this traffic across the two clusters i have a follow-up question

24:14 to that michael um for that particular use case this seems like overkill if i just wanted to load balance between two

24:21 different applications running two different clusters i could use dns round robin that would be real easy so what

24:26 would be a more i guess real world application of what you're showing where you want this sort

24:32 of virtual networks and subnets that are spanning multiple clusters

24:38 one use case is if you don't have a dns federation between the clusters usually your core dns only takes care

24:46 for the internal dns resolution if you want to have dns federated across different

24:52 clusters you need to install additional software you need external

24:57 dns for example which is just coming to to live

25:02 or you would have to coordinate with an external dns system

25:08 but this is not limited to just dns based communications so you can use that to

25:15 exchange routing information between just the plain service so what we have done and again you're

25:22 absolutely right um for for that kind of setup it would be a little bit overkill i just want to use this to demonstrate

25:28 some of the capabilities what you also can do at the same time is you can

25:34 just natively export the pod network from one cluster to the

25:39 next cluster so you one pot in cluster three can natively talk to another part

25:45 in cluster one or you can just export the default

25:51 service network and if you don't have dns resolution then you don't go usually through a dns

25:58 front robin but you can directly rely on ip reachability so it basically saves you the pain of federating your dns

26:05 which there is no really good automated solution in place today yeah and if i can just add to that another benefit

26:12 that we get with anycast is um any cast will like the the clusters are all

26:17 basically directed uh directly connected in this scenario but with anycast if there was some distance between the

26:23 clusters anycast will pick the closest cluster based on vgp

26:28 but an even cooler benefit is the advertisements of the external ips are

26:35 tied to service availability so as the service comes up if it's healthy if it has enough back ends contrail will

26:41 originate that external ip using vgp if the cluster fails if the app fails the route gets withdrawn so you get failover

26:47 between all of the members of the anycast group and that's like you can do you can do

26:53 all of this with dns but with dns you would need to monitor the endpoints pull out records as service went down like

26:59 you you basically need to wrap an anycast load balancing mechanism in additional resiliency to deal with

27:05 failure detection whereas with any cast you don't so you would actually use the two concepts together i would probably front end this

27:12 type of setup with round robin dns and use any cast to pick the right set of clusters for each of the

27:20 available back ends yeah because and to add to that because you're advertising from a dns

27:26 perspective and again i'm going to pau stop after this but to your point if i configure my dns to

27:33 point to one specific ip that is in anycast configuration across

27:39 my clusters problem solved at that point right because bgp will pull a bad peer out of

27:45 out of the routing and be able to distribute appropriately this is actually pretty cool this is

27:51 actually pretty cool so thank you well thank you both for the great question it is definitely overkill in

27:56 this case but it's uh it demonstrates the concept if you go through the kubernetes

28:02 documentation around service network around services they do have one paragraph on why they don't rely on dns

28:10 for various reasons and that's mainly around tpl you you don't know how the dns clients

28:16 are being configured you could get stuck with old dns records and there are a lot of problems around just relying on dns

28:24 but combining the two having dns to provide the name resolution to an anycast ip address

28:29 gives you basically the best of both worlds because on bgp we know the availability of a certain path and if

28:36 that path is for whatever reason not available then we withdraw that path from the routing and the cool thing here

28:43 is that our bgp availability for the path is influenced by the availability of end

28:51 points so if one pot goes down in a service we are withdrawing that route to that

28:56 pot from the bgp process itself so that's the level of interaction we have between

29:02 kubernetes itself and um networking concepts like pgp

29:08 so if i put this instance say i put something in aws azure gcp

29:14 and advertise can you do it can you do an anycast advertisement across those

29:20 clouds and use something like cloudflare to front end my dns and distribute across those different cloud providers

29:26 or even on-prem as well absolutely um the only requirement we have between the three clusters or any

29:33 clusters is ip reachability very right

29:38 yes exactly very random problem is if you can't have extra iphones if you have to network at rest translate you have a

29:45 problem because pgp pro nature does not support uh network and first translation okay cool thank you

29:51 and with that we are basically true just to show the expected output is more or less uh matching what we have seen here

29:59 and that more or less concludes my demonstration and again it's not necessarily the use case you would use

30:05 it for it's just here to show you how we interact with the resources

30:11 and some of the capabilities around multi-cluster and control plane federation

30:16 you can be really creative on what you are going to use that kind of technology for

Watch

11:29

Screenshot from the presentation shows Prasad Miriyala’s image in the bottom left corner and a slide from his presentation to the right. The title of the slide is “Kubernetes Native Contrail Networking.” A diagram of the Contrail networking architecture is below with “Workloads” at the top and “Virtual Machines,” “Containers,” and “Virtual Machines” below. Then below that is “vm,” “kata containers,” and “KubeVirt.” Below that is KLM, DPDK and SR-IOV. Below that is SmarNIC running VRouter.

Juniper Networks Contrail Cross-Cluster Virtual Networking Demo

Virtual Demo: Contrail Cross-Cluster Networking

You’ll learn

Who is this for?

Host

Resources

Transcript

Experience More

Juniper Contrail Networking Architecture

Juniper Networks Contrail Observability

Juniper Networks Contrail Pipelines Demo

Juniper Networks Cloud-Native Contrail Networking

Juniper Networks Network Isolation with Contrail Demo