Apstra Solution
Rethink how you think about data center operations
With Juniper Apstra it’s possible to automate the entire network lifecycle to simplify design, deployment, and operations, and to provide continuous validation. Learn more about how Apstra can deliver assured experiences for applications and operators in this comprehensive guide from Juniper’s Mehdi Abdelouahab.
You’ll learn
How Apstra can help solve Day 0/1/2 challenges through automation with a unified tool for architects and operators
Three reasons why you should use a Graph model for network operations
How the Intent Time Voyager helps manage the infrastructure as a whole system, thus increasing agility
Who is this for?
Host
Transcript
00:05 [Music]
00:10 good morning everyone glad to be here um
00:14 so i'm a media de la hub i'm a
00:17 consulting sc for uh uh abstra um the uh
00:21 product that has been acquired
00:23 by um juniper a few months ago so uh
00:26 we're gonna
00:27 we're gonna stay in the automation
00:29 domain uh and uh
00:32 uh more towards towards data center
00:35 um so
00:36 uh
00:37 we
00:38 we are a a data center network
00:40 automation solution and i'm gonna
00:42 elaborate on on that uh but um
00:46 at a high level when we look at uh data
00:49 center network operations and uh and the
00:52 usual challenges that uh that we observe
00:54 in this uh in this area
00:56 um so we generally have you know those
00:58 those you know those two personas you
01:00 know the architects and and the
01:02 operators so sometimes with some
01:03 organizations they can be mixed and then
01:05 in other organizations they are they are
01:07 separate and they do have different
01:09 uh you know challenges that they face
01:11 daily so uh i'm not going through all of
01:14 them but typically the architects will
01:16 be um those of you who are in in the
01:18 network architectures you probably care
01:21 more about um
01:23 you know selecting technologies that
01:25 gives you uh freedom to enable and
01:28 activate uh services in a seamless way
01:30 so you look for activation delays and
01:32 then shortening those delays and
01:34 enabling uh service activation uh as as
01:37 quickly as possible from may from a from
01:40 a an agile standpoint uh those of you
01:43 who are in in the architect teams as
01:44 well
01:45 you care about uh vendor flexibility
01:48 right so uh you make uh you make
01:51 design or technology choices today so
01:53 you look for some for the latest and
01:55 brightest out there uh and obviously you
01:58 don't want to lock yourself in a given
01:59 technology or a given vendor today and
02:01 then you want to return the freedom to
02:03 uh
02:04 um to benefit from the from the uh
02:06 whatever enhancement that can come
02:09 later on whether they are harder related
02:11 or you know asic related or not related
02:14 or any even protocol related right um on
02:17 the other side if you are an operator
02:20 you care
02:21 more about where you're most likely um
02:25 up against uh things like um dealing
02:28 with with resource planning and this
02:29 constraint and and the need to have you
02:31 know um knowledge retention in your
02:34 teams and and and and deal with those
02:36 those challenges you also have to deal
02:38 with um
02:40 um the
02:41 typical processes uh related to
02:44 infrastructure changes and uh um going
02:47 from you know tech reviews to approvals
02:50 and so on which typically um
02:53 can be more or less um time-consuming
02:55 and and can in a way slow down the uh
02:58 the agility which is one of the
03:01 objectives of the typical architect
03:03 teams um you also have this major
03:06 concern which is the reliability it's
03:07 probably one of the number one um you
03:09 know challenge that the operators are
03:11 facing so uh the ability or the need to
03:14 deal with changes in a reliable manner
03:16 um because it's also part of your of
03:18 your daily jobs right so um
03:21 you do have different automation
03:22 solutions out there uh in the dc space
03:25 and of course some of them are
03:26 addressing the architects
03:28 more than the operators and so on we
03:29 strive to uh to come up with uh with uh
03:32 with a technical solution for for both
03:34 and hopefully uh we're going to um
03:36 explain that or elaborate on that in in
03:39 the
03:40 in this session so uh
03:42 abstract uh the if you look at the the
03:45 claim or the the you know the technical
03:47 claim that we have
03:48 we are in the business of intern based
03:50 networking so what what this means right
03:52 so
03:52 from a
03:54 high-level perspective this starts with
03:56 the idea that we want to focus on on the
03:59 what uh and let the systems derive
04:01 derive the how from that so uh
04:04 we want to raise the uh the uh the level
04:08 of of
04:09 the declarative nature of the user input
04:11 at a higher level than than before and
04:14 and this matters because the more you
04:16 have your user input which is the
04:18 clarity the more you get closer to this
04:21 um
04:22 to this abstraction uh which which gives
04:24 you really agility and and going into
04:26 this
04:27 into this cloudy operations right so uh
04:30 abstracting stuff away and and raising
04:33 the level of the user input to a to a
04:36 minimum required is an absolute
04:38 requirement to uh um to have um you know
04:42 to meet the agility requirement from
04:43 from the architects teams um so the
04:46 starting point for us is to really
04:47 recognize the system as or the networks
04:49 as a distributed system right so
04:52 we don't want to manage boxes
04:54 individually we manage systems right uh
04:56 so for us
04:58 the uh
04:58 um
05:00 the goal is uh to um to have the
05:02 operator express what the
05:04 what the outcome of the system should be
05:06 and let's let's uh let aos or that
05:09 abstract derived from that um the uh
05:13 necessary steps to get there um so
05:15 really raising the level of attraction
05:17 to to uh something highly declarative
05:19 and letting the system derive from that
05:22 or generate artifacts to uh
05:24 generate all the configurations required
05:27 as well as all the
05:28 expected validations is is the way to go
05:32 and uh what's important as well is that
05:35 uh this this this uh user specification
05:38 this user input is this what is actually
05:40 modeled in the back in the ios in a in
05:43 a in a logical model meaning it's
05:46 completely decoupled from any vendor
05:49 uh or any any model so we store it in a
05:52 in a completely logical way and uh
05:56 that is also a major enabler in in in
06:00 the agility so um from there not only we
06:03 can of course um you know select again
06:06 any given you know
06:08 model here and there
06:09 but uh
06:10 having a a logical definition of the
06:13 user intent not hardcoded to any
06:15 specific device is also what enables us
06:18 to have this close loop you know
06:20 automation assurance that you have in
06:21 the middle so what this means is that
06:24 the uh the product will go beyond
06:25 configuration it will also generate a
06:28 set of expectations that the system has
06:29 to meet and those expectations are are
06:32 derived from from the logical intent
06:34 because we know the high level from the
06:36 beginning so we are then able to derive
06:38 a very detailed expectation uh to then
06:41 um you know automate the data collection
06:43 of of those of those telemetry as well
06:45 as the analysis of the telemetry and
06:47 comparative expectations to dictate
06:49 whether the uh the overall architecture
06:51 is behaving as expected or not so it's
06:54 really
06:55 going beyond this configuration and
06:56 having some form of monitoring but a
06:58 monitoring which is contextual to your
07:00 intent so it's having like a
07:02 a tool that automates this monitoring
07:04 activity but with the with the context
07:06 in mind
07:07 so um
07:08 obviously the uh the the advantage for
07:10 you is from a day to perspective is the
07:12 ability to deal with changes reliably
07:14 because uh you have
07:16 a in in a combined solution both your
07:19 day 0 day 1 as well as your day 2
07:21 solution which is aware of your intent
07:23 from the beginning and able to validate
07:25 it in real time
07:26 and on a continuous basis
07:29 so um
07:30 to to recap that ibm right if you want
07:33 to come up with some form of definition
07:34 even though there is a real formal
07:36 definition of ibm i mean we we have uh
07:38 people within uh within the company who
07:40 are contributing to some some icf draft
07:43 um
07:44 to uh to come up with with some some uh
07:46 some normalization there but from our
07:48 perspective uh it's basically those
07:50 three pillars the first one is having a
07:53 unified
07:54 uh tooling around a single source of
07:56 truth and it's actually this is going to
07:58 be the the major focus of of the rest of
08:00 my presentation is explaining how
08:03 astra acts as a single source of truth
08:05 and what implementation
08:06 backhand we have selected there to uh to
08:09 create a single source of truth that is
08:11 uh able to cope with the dc challenges
08:14 uh the second aspect is um
08:17 is the idea that you you want to
08:18 automate the entire life cycle right you
08:20 want to have a single tool that that is
08:22 being used by by architects to design in
08:25 a logical fashion using um a
08:29 logical building blocks that they can
08:31 assemble uh to create uh um you know
08:34 predictable designs um and and scalable
08:37 designs and having also a design
08:39 disciplines where where the systems you
08:41 know prevents you from designing
08:42 something uh that is not a you know a by
08:44 the book um um
08:47 you know regular list on architecture
08:48 and so on so we inserted a lot of
08:49 expertise uh even at design level to
08:52 prevent the architects from from
08:54 shooting himself in in the fit and
08:56 eventually creating something that is
08:58 not going to scare from from a from a dc
09:00 perspective um
09:02 and and then going from the design to
09:04 the build uh the build process when you
09:06 will instantiate whatever in logical
09:08 templates we have created at design time
09:11 um and uh let the system as well apply
09:15 um you know this
09:16 versus cat element where where we will
09:18 basically automate the distribution of
09:21 variables at scale in a very automated
09:24 way um a typically swine fabric consumes
09:27 a ton of variables you don't want to
09:29 basically be prospective in in assigning
09:32 every every asn or ip look back in a
09:34 in a big five-stage network so you want
09:37 to have no systems to manage those
09:39 infrastructure variables at scale and
09:41 then moving of course to the operator
09:43 phase where you want to have the ability
09:45 to define services in minutes and have
09:48 them deployed in minutes as well as
09:49 operated right after the the
09:51 configuration push
09:53 um so automating the entire lifecycle is
09:55 is important but for that you need to
09:56 have a single social shoe that is able
09:58 to feed all those uh those steps um
10:02 so um and then uh the last pillars of
10:05 course being muted under as i said uh we
10:07 completely decoupled the the uh the uh
10:10 the user intent from from whatever is is
10:13 being um selected underneath so for us
10:16 the hardware selection is is something
10:18 that arrives quite late in the process
10:20 you do it almost at the end of your of
10:22 your build uh right before you hit the
10:24 commit button
10:26 so
10:27 what's the uh you know one of the major
10:29 challenge when you build the network
10:31 automation solution is to have a
10:32 reliable source of truth um
10:35 the social shield is really the way you
10:38 express your desired outcome and you
10:40 need to have something
10:41 um
10:42 that is really adapted to the network
10:44 domain that you are tackling in our case
10:45 data center
10:47 fabrics and you need to have something
10:49 um really extensible enough as well as
10:52 programmable enough as well as something
10:54 that you can query in a very efficient
10:57 way because the source of truth is
10:58 something that you know you query all
10:59 the time so you ask questions to that
11:01 source of truth so there's a lot of
11:03 requirements uh with regards to the uh
11:06 implementation that that
11:08 um that you uh
11:11 you uh you have there
11:13 so one
11:14 the uh the choice that we made in in um
11:17 in in abstract is that um
11:20 for for this product is that we selected
11:22 the graph database right so graph
11:23 databases are our type of databases out
11:26 there uh they have been around for quite
11:29 some time and it's one of the really
11:31 powerful and metropolitan job that when
11:32 it comes to databases uh for for the
11:35 history i mean it has been used
11:37 initially for you know with social
11:38 networks and so on so it's uh it's
11:41 something that really comes from the
11:42 twitters and facebooks of this world
11:45 and uh it's used in a variety of domain
11:48 to to
11:50 for a modeling um um you know um
11:54 practice right um
11:56 so
11:58 we
11:59 uh in astra have been strong believers
12:01 that to model an intent
12:04 uh grad databases are very suited so for
12:06 example when we talk about young data
12:09 modeling and so on these are very well
12:11 suited to have device level modelling so
12:13 it's quite you know lower in the stacks
12:16 but when it comes to having a system
12:18 modeling or an architecture modeling um
12:21 gravitables are very well suited so um
12:24 and i'm going to elaborate on some of
12:26 the reasons there so
12:28 the first one is that uh graphdbs are
12:31 very um efficient to uh to model highly
12:34 connected data domains right
12:36 you instead of storing the data in
12:38 tables in tabular formats you're storing
12:40 them in nodes and then graph dbs
12:44 nodes are interconnected with
12:45 relationships and relationships are as
12:48 important as the data in the nodes
12:50 itself so so the way data is
12:52 interconnected is as important as the
12:53 data itself
12:55 and the
12:57 data center list by network is a highly
12:59 highly connected domain right it's it's
13:01 just image network between some spines
13:03 responds in super spines uh it's also
13:06 quite complex from a from a um
13:10 services running on top especially if we
13:11 talk about evpns the
13:14 the the various layers you have there
13:16 from from underlay to overlay are are
13:19 quite complex so you need to have a way
13:22 to capture this this uh this highly
13:24 connected uh domain and rdbs are super
13:27 well suited for that the other uh uh
13:30 reason is that it's highly extensible
13:31 right it's it's kind of schema-less i
13:33 mean it has a schema per se but that
13:35 schema is super extensible
13:37 as as as the requirements evolved you
13:39 can add new node types and new node
13:41 relationships and and and extend in any
13:44 direction you want so um so it's it's um
13:47 it's uh you know extensible or even
13:49 scalable um from the perspective to cope
13:52 with new network requirements and the
13:55 last one which is very important is that
13:57 it exposes very sophisticated query
13:59 languages right so um um it's it's
14:01 something that allows you to uh ask very
14:04 complex multi-dimensional questions um
14:07 in in a very efficient way much more
14:08 than any sql um you know a query
14:11 language if you are running a
14:12 traditional rdms or dvms database right
14:15 and those three elements are are pillars
14:18 that
14:19 that that we will rely on and i'm gonna
14:22 i'm gonna elaborate on on why so
14:25 anytime you will use aptra whether by
14:27 the ui or apr or anything you're
14:29 expressing you express an intent you
14:30 design a template to design an uh i
14:33 don't know where any any and you'll
14:34 express any policy from a from an even
14:37 perspective or writing policy
14:39 that using intent is stored in the graph
14:41 db's and you see here you know notes and
14:43 relationship and so on and this black db
14:45 will serve as the source of truth
14:47 anything will come out of that being
14:49 configuration rendering or expectation
14:51 rendering as well as cabling map
14:53 anything is an artifact of this graph
14:55 even
14:57 so um if you um
15:00 you look at uh you know the typical ui
15:02 uh in in the product you will design a
15:04 very very small list by network you have
15:06 two spines four leads here a couple of
15:08 servers in every rack
15:10 uh ios would basically uh see um uh your
15:13 intent in this form right with with a
15:16 couple of nodes and relationship and so
15:18 on
15:19 so uh um um
15:21 you still interact with with you know
15:23 your your your you know your traditional
15:26 ui to basically express your intent but
15:28 under the hood we we capture that by
15:31 adding nodes in a relationship and so on
15:33 to understand
15:34 your intent and overlaying physical as
15:36 well as logical uh aspect to that now if
15:39 you scale that to uh you know we have we
15:42 have major telcos that are uh using the
15:44 product with uh you know um
15:47 a few hundreds of of of of of uh many
15:50 hundreds of devices per dc's and
15:53 with you know data center interconnect
15:55 and so on so you're looking at
15:56 potentially hundreds of thousands of
15:58 nodes um from a physical standpoint and
16:01 and thousands of you know big exam
16:03 services and so on so this translates
16:05 into something that will basically be um
16:08 you know presented on the right um our
16:10 largest customers basically uh have have
16:13 something between 20 and 30 million
16:15 relationships in the database and and
16:17 modernly you know more than a million
16:20 evpn routes and so on
16:22 and that
16:23 is how ios is the network so from there
16:27 uh we can then do a number of things so
16:29 we don't
16:31 need to uh um like traditional
16:34 automation solutions
16:36 store uh
16:38 things that are basically a result of
16:40 the intent so we don't store device
16:42 configurations we don't store uh cabling
16:45 maps we don't store typical
16:47 artifacts like this all of those are
16:49 result of computing the graph and we
16:51 leverage
16:53 lightweight processes that are that can
16:55 be parallelized to uh
16:58 you know run those tasks and and render
17:01 configurations as well as run their
17:02 telemetry expectation as well as collect
17:05 telemetry as well as compare those
17:06 telemetry in real time um so we model
17:09 really the bare minimum and the graph
17:11 database and we let the system derive
17:14 from that everything that is required
17:17 and
17:18 why this is important is because then
17:21 they you know the tool itself will have
17:23 a complete understanding of the network
17:25 so if i take this small list by network
17:28 we have seen before
17:29 uh the graph representation of it will
17:32 more or less look like this i would have
17:33 like two spines and a couple of leaves
17:35 so you see the physical representations
17:38 uh and and you see all the logical
17:40 representations typically evpn links and
17:42 services here so you see nodes uh
17:45 uh virtual networks and and and then
17:48 instances of those virtual networks
17:50 connected to specific leaves
17:53 and then um
17:54 eventually svis if you're enabling
17:57 enabling vxlan routing
17:59 and so on and so forth so um
18:02 typically uh this is how ios will will
18:05 will
18:06 capture the the user intent
18:08 and and from here
18:10 um we will have a number of steps that
18:13 will happen the first one is what we
18:15 call preconditions check meaning uh
18:18 the system will will check the user
18:20 input for any new service that is
18:22 expressed and then once you pass those
18:24 validation we will basically enable post
18:27 conditions check which is deriving the
18:29 expectations and making sure that
18:31 whatever configuration or state we have
18:33 we haven't forced on the switches is
18:35 actually being met so um
18:38 i'm going to take examples of the
18:39 preconditions check um which is an
18:41 important part of the automation as well
18:43 and then the post conditions check which
18:45 uh comes into the
18:46 uh falls into the the validation
18:48 category
18:49 um so the preconditions check is anytime
18:52 you are in a run phase so you have a
18:54 running infrastructure and you are
18:55 requesting new services right you're
18:57 requesting new um
18:59 i don't know a new a new browsing policy
19:02 a new acl policy uh any anything that
19:06 you you express in the system so this is
19:08 a screen capture that shows that you
19:10 know you are staging uh a new a new
19:12 intent and and aos will will will
19:15 perform a number of
19:17 semantic validation meaning it will look
19:19 at the user input and and make sure that
19:22 it is
19:23 complete and exhaustive in terms of user
19:26 data as well as the fact that it's not
19:28 conflicting with any existing um
19:33 data or policy or whatever so the idea
19:36 here is that because you have a system
19:38 modeling of the user intent uh the the
19:41 tool is able to make sure that uh at no
19:44 point in time the operator is is is
19:47 triggering an automation workflow with
19:49 data that will break at the end right so
19:51 what happens here is that if there is
19:53 any data that is missing or that if it's
19:55 provided but in the wrong format or
19:57 provided with the right format but
19:59 conflicting with the previous policy or
20:01 anything that um is is deemed to be uh
20:04 incorrect you would have what's called
20:06 build errors so these are preconditions
20:08 check a typical example is you know
20:10 duplicate ips uh so as a human issue you
20:14 even if you use an automation tool you
20:15 can create
20:16 two services in the same tenant and have
20:18 the same uh the two services have the
20:21 same eyepiece by mistake and the tool is
20:23 here to prevent you from doing that
20:24 preventing from doing that is stopping
20:27 any commit possibility so you do not see
20:29 any configuration
20:31 being sent to the switches if there is
20:33 enabler we just uh deep you know gray
20:36 the commit button and stop the api
20:37 equivalent endpoint right so
20:40 um and that's important for us right to
20:42 basically make sure that
20:44 there is an in-depth uh verification of
20:46 the user input um
20:48 and this is you know leveraging all the
20:50 system the system validation so this was
20:53 pre-condition check um when you
20:56 typically stage a new service or create
20:59 or basically do any ad moves and changes
21:01 to your to your intent um so making sure
21:05 you are not basically uh going to uh
21:07 make a change that will break the
21:09 network you know again it's really
21:10 really dearly seeing those changes and
21:12 then let's say you have passed this this
21:14 user input validation and
21:16 whatever user request you have you have
21:18 put into the system is is deemed to be
21:20 to be uh feasible so the system lets you
21:22 commit it at any point in time when
21:24 you're ready to commit
21:25 so the configuration rendering will be
21:27 generated and pushed to the new devices
21:29 incremental configuration will be
21:30 handled and so on
21:32 and then uh you will have also the
21:35 expectation rendering in the tool that
21:37 will
21:38 generate expectations and compare those
21:40 expectations to the actual data in in
21:43 the infrastructure
21:44 so i'm taking here a typical example of
21:46 something that is um you know quite
21:48 symptomatic from from today's challenges
21:49 which is
21:51 evpn right so when you configuring vpn
21:54 um which is one of the complex protocols
21:56 today um you have a lot of
22:00 states a lot of frauds and and for the
22:02 first time you have
22:04 you have more service routes than than
22:06 actual customer rights right so one
22:09 you know if you can wrap types 3 and 5
22:11 for example which is in in this example
22:13 dashboard here
22:15 these are infrastructure routes and and
22:18 you need them to be present on the
22:20 switches um right after you configured
22:23 any any layer to layer through vxlan uh
22:26 irrespective of any user traffic or any
22:28 customer basically consuming those those
22:30 uh those services um the consumption of
22:33 those european services will trigger
22:36 additional rust but these are really
22:37 infrastructure
22:39 and here you have something that in a
22:41 contextual manner anytime you request i
22:44 don't know 50 new vxlan or 2000 new gxl
22:48 and services it will automatically
22:50 derive the expected routes it is
22:52 supposed to see in every switch in a
22:54 very detailed way so you know 15 here
22:56 online here is you know those are
22:58 counters that are related to very small
23:00 infrastructure and showing in this case
23:02 the deviation the fact that three of the
23:04 expected routes are missing and and
23:07 three here of the expected lot as well
23:09 in the elite i5 are are missing on the
23:12 um
23:13 on the switches right um so
23:16 if you want to do this manually you will
23:19 typically type show command so these are
23:21 show commands from from junos to
23:24 look at the
23:25 uh bgp vpn routing table and requesting
23:28 uh like show me the top three routes in
23:30 two minutes by five routes right
23:32 what we see here is that first of all
23:33 the user input is not obvious these are
23:36 the
23:37 the data output is not obvious right
23:39 because
23:40 this is mp bgp we have we have
23:44 we have uh
23:45 we have to deal with drug distinguishers
23:47 we would route targets um
23:49 the interpretation of the data is not
23:51 obvious and moreover if
23:54 if something is missing it's it's quite
23:56 hard or sometimes almost impossible for
23:58 a human
24:00 to cope with
24:01 so
24:02 you are typically looking to use show
24:05 commands to troubleshoot something that
24:08 you know at scale
24:09 is um very often almost impossible to do
24:12 from from your perspective and scale
24:14 here i mean the moment you accept 10
24:16 racks
24:16 with a with few hundreds of of v lines
24:19 um stretch you you reach that scale so
24:22 um you're typically like okay i'm typing
24:24 your commands and looks good to me but
24:26 uh if something is missing you you you
24:28 rarely uh are able to to see it and and
24:32 what we do here is we have something
24:34 that automates
24:35 those expectations and those checks and
24:37 the level of checks we do on on on you
24:40 know on day one like when you stand up
24:43 the fabric and and
24:44 and deploy your first services is
24:46 exactly the same as the number of checks
24:49 that we do let's say in six months down
24:52 the road after you know having uh uh
24:54 added um you know 50 racks or 100 racks
24:57 and and i don't know how many uh virtual
24:59 services the uh uh the objective here is
25:03 to have something that makes the same
25:05 checks consistently uh
25:07 in a continuous manner uh with the
25:09 purpose of of having no technical depth
25:12 whatsoever right um you you don't have
25:16 any configuration drift because uh there
25:18 is no incremental configuration that is
25:20 pushed to the switches if there is no
25:23 automated expectation that is generated
25:25 and automatic data collection from from
25:27 the switches to to compare what and to
25:29 automate the comparison of that data to
25:31 the expectations
25:33 so that's for us
25:34 um you know um an important um you know
25:37 aspect of how we want to de-risk day two
25:40 changes and and make change management
25:43 in in in um
25:45 in today's operations um being something
25:47 that is you know not scaring every every
25:49 every teams or every operations teams
25:52 um and um yeah so typically here the
25:54 typical user will do uh we'll look at
25:56 the output and try to see what what do i
25:59 have here
26:00 am i having the right v tabs for the
26:02 right dni am i missing something what
26:05 entry i'm expecting to see for a
26:07 specific vtep and do i have the right
26:09 processing resource driven
26:12 approach it's kind of impossible to
26:14 derive this expectation in real time you
26:16 cannot use the configurations to derive
26:18 these expectations it's a very hard
26:19 problem to solve in software uh
26:21 configurations have to be a result the
26:24 same way as validation have to be a
26:26 result of a more logical um
26:29 you know definition of your intent
26:32 so um moving on and to illustrate what
26:35 this will look like uh after you have
26:38 been looking at the back end
26:39 implementation so you see things like
26:41 this so this is an example of the beach
26:43 appearance you see on a typical switch
26:46 uh a number of expectations and you've
26:48 seen expected versus an actual colon
26:50 and and of course any any you know
26:52 deviation here is marked in red
26:54 and uh this is updated in real time as
26:56 you add drugs or services uh and you see
26:59 this contextual monitoring and
27:02 continuous validation uh you don't have
27:04 to worry about you know which ip address
27:06 is used for any which spine and so on we
27:08 leverage all the data modeling and which
27:10 model to present you with with uh you
27:12 know the
27:13 information to uh uh pinpoint and
27:15 understand what's what's being wrong in
27:17 which part of the infrastructure so this
27:19 is an example on the bgp routing on on
27:21 the routing tables you have other
27:23 examples like this i have shown before
27:25 the evp
27:26 or the overlay routes this is
27:28 looking for the underlay routing do i
27:30 have the right uh uh
27:32 um
27:34 entries in my routing tables on every
27:37 switch yes or no i have an expected
27:39 versus an actual comparison um and at no
27:42 point in time um i am you know bothered
27:45 with meaningful uh or meaningless um
27:48 alarms i have only contextual alarms
27:51 anything that is red is contextual to an
27:53 expectation right and then you have
27:55 other type of validation of telemetry
27:57 that is not necessarily deterministics
27:59 is more like traffic oriented so show me
28:02 the path between any two end points in
28:04 the fabric i pick one server in one rack
28:06 and another server in another rack
28:08 potentially in different parts and i
28:09 want to see how the the traffic is
28:11 flowing or how how the fabric is
28:13 behaving between those two end points so
28:15 i have here you know examples of of a
28:17 three-stage
28:19 topology and showing the path between
28:22 two specific endpoints
28:24 so again this is really traffic oriented
28:27 let's say i have also the right routing
28:30 entries and the right configurations i
28:31 can uh inter you know create the system
28:34 to check for any unbalanced situation
28:36 and there again uh this is irrespective
28:38 of the right or wrong configuration you
28:40 may have the correct configuration of
28:42 this wheel on the switches but still
28:44 have an unbalanced situation um which is
28:47 you know symptomatic from any elephant
28:49 flows that can
28:50 arise in in your network so you also
28:53 have dashboards that look for for acne
28:55 and balance both layer 3 cmp inside the
28:58 fabric or layer 2 cmp which is uh
29:00 between your servers and your on your on
29:02 your on your leaves uh and you know this
29:05 is more it's a bit more difficult to
29:07 generate alarms here i mean or to create
29:09 an unbalanced situation but you get the
29:11 idea of the um of how the gorgeous will
29:14 will uh evolve sure there is any any
29:16 invalid situations and the important
29:17 here is that you can define your own uh
29:20 provide your own definition of what an
29:21 unbalance is um
29:24 so um this this um analytics pipeline
29:27 that is behind this this dashboard here
29:30 has has has uh the ability to uh
29:34 gives you the ability to customize the
29:36 the definition of an imbalance
29:39 by by saying the amount of standard
29:41 deviation that you tolerate between
29:44 links uh as well as as as the
29:47 observation intervals so that uh based
29:50 on your traffic pattern you can you want
29:52 to be elected if if if imbalance uh is
29:56 is matching a given condition which is
29:58 probably different from one customer two
29:59 to another so all of this is highly
30:01 customizable um and then other views of
30:04 the of the um you know leveraging the um
30:06 the data model is that again you can
30:08 interrogate the source of truth very
30:10 easily and you can create dashboards
30:12 like this to say i want to see the notes
30:14 out versus east-west traffic um
30:16 distribution right and from design time
30:19 we have we have modeled what the
30:21 external link is and what what the
30:22 internal what the fabric link is and
30:24 what the server facing link is and so on
30:26 so it's requiring the source of truth
30:29 and creating a dashboard out of this
30:31 is extremely easy and more importantly
30:33 is that when you create a dashboard like
30:35 that it's in sync with your internet
30:37 meaning the moment i add external links
30:39 for whatever reasons because i need more
30:42 you know more more traffic to uh towards
30:44 my mp speeds or my one and whatever
30:47 this dashboard is automatically aware of
30:49 any any any modification so it
30:51 automatically collects data related to
30:53 new interfaces because it's aware of a
30:56 change that will impact the data
30:57 collection here so there is no more
30:59 disconnect between someone managing the
31:01 configuration and someone managing a
31:03 monitoring stack and the need to update
31:05 the two there is one and only one single
31:07 source of truth serving any any any any
31:11 single feature in this product being
31:13 configuration or validation right
31:15 um
31:16 so the data model leverages this in in
31:19 um in real time and then
31:22 carrying on like what what else can i do
31:24 with with the graph modeling of my
31:25 source of truth well what i can do is
31:27 that i can i can do this between
31:29 different versions of the graphs and
31:32 this here is meant you know the same way
31:34 as as it gets diff you know if you will
31:38 if you look at an analogy in the in the
31:41 uh in industrial control right so uh um
31:44 i can the system can very easily compute
31:47 diffs between between an an existing
31:50 version of the of the graph and the
31:52 previous one right and
31:55 what this allows me to do is
31:57 to really version control my network and
32:00 um so you know journals have had this
32:03 amazing uh rollback uh feature uh years
32:06 ago which is you know uh certainly
32:09 i mean something that every every every
32:11 network operator has been amazed with
32:13 and and we just
32:15 use the same concept but make it system
32:17 wide
32:17 meaning i can i can at any point in time
32:20 roll back an entire file break
32:23 with you know
32:24 two three clicks away i can i can come
32:27 back to a previous revision and that can
32:30 change configuration on one switch 10
32:31 switches or 100 switches at a time um so
32:35 we uh we store you know the latest
32:37 revisions like the five latest revisions
32:39 and we allow you to store 25 uh other
32:42 ones for for
32:44 for for life if you want to store them
32:46 for forever and roll back to them
32:48 so uh imagine the power of the rollback
32:50 on on a specific device and making it
32:53 system wide
32:54 knowing that we manage very large uh uh
32:57 um data center uh leaves one
32:59 architecture for for customers um so um
33:03 and yeah this feature is called intent
33:04 time via jurors so uh uh which is
33:06 basically uh what we just mentioned uh
33:08 and the uh
33:11 the you know the bottom line is is is is
33:13 that again we want to manage
33:16 the infrastructure as a system and not
33:18 as as individual boxes which is uh in
33:22 our view the only way that you were to
33:24 reach you know this this agility uh even
33:26 though under the hood you get access to
33:28 configuration underneath every switch
33:30 but you manage it as as a system
33:33 and then moving on uh this this graph
33:35 database uh that that that we um we
33:39 mentioned is also something that we can
33:42 use to uh ingest as well uh data from
33:45 external systems so we have to report
33:48 integrations
33:49 and uh one example of that instead of
33:50 the integrations with with vmware tools
33:53 so vsphere and nsx and so on
33:55 um
33:56 and um
33:58 we we um we can then uh create
34:01 additional nodes in the relationship in
34:03 this source of truth to model data
34:06 coming from
34:07 another source of truth but for which we
34:10 you know we want to be aware of
34:12 to validate our domain
34:15 so uh i'm gonna explain this through
34:17 through an example we have this
34:19 integration with with with with nsx um
34:22 and
34:24 what it allows us to do is that we will
34:27 obtain through read-only api through 360
34:30 manager we will get the identity of all
34:32 the vms all the transport nodes all the
34:35 uh
34:36 as well as the opening profiles of every
34:38 vm
34:40 and the micro segmentation policies and
34:43 and you know you name it so a number of
34:46 of information that are outside of our
34:48 domain but for which we want to be aware
34:50 of because it allows us to then
34:52 correlate that with with the with the
34:54 underlying right so we then have the
34:56 ability to have to know uh what vms we
34:59 have running in the fabric where they
35:01 are located like what esxi is hosting
35:04 them and which
35:06 top of rack is connected to to that esxi
35:09 and what are the user intent from the
35:12 vmware administrator and do i have the
35:16 right uh configuration in the fabric or
35:18 the right user intent in the fabric to
35:19 satisfy that right
35:21 so that we can bridge the gap between
35:23 between the two domains and um
35:25 we uh once we uh enhance the graph
35:28 database with this information we have
35:29 automatic validation pipelines that pops
35:31 up and lets you know whether there is a
35:33 discrepancy between one domain and
35:35 another uh and typical example is i i do
35:39 have like a
35:40 requirement for a vlan
35:42 backed interface from from vmware and
35:44 this one is missing from the user intent
35:46 so the network admin has just not been
35:48 aware of being notified that he has to
35:50 create a new service in ios but with
35:52 that he'd be aware that he'll soon get a
35:55 probably phone call whatever from the
35:56 vmware guys to say hey i need this
35:59 uh vlan here or i need you to jdm to you
36:01 whatever so we empower the network
36:03 administrators with knowledge outside of
36:05 the network domain and in some cases
36:08 exposing autonomous radiation workflows
36:10 right which is
36:11 a way that atra has to let the user know
36:15 that hey you have like i don't know
36:17 fertilizer normally is here
36:19 click this button and i'm gonna do the
36:21 add moves and changes to your intent
36:23 automatically for you
36:25 uh to change the underlay to make it in
36:27 sync with the with the uh with the
36:29 overlay right um we we do not make any
36:32 modification on on on vmware part there
36:34 is a management tool for that but we can
36:36 take the report from there and
36:38 automatically make the underlay
36:40 fabric uh apart being in sync with those
36:43 requirements and the goal is to really
36:45 speed up the
36:47 speed of the uh
36:48 the deployments and avoid being um you
36:51 know you know a button like um when it
36:54 comes to underlay overlay validation
36:58 um and all those examples any any
37:01 dashboards you have seen any dashboard
37:04 is under the hood highly customizable
37:07 so um
37:09 this is the anatomy of of of
37:12 an analytics pipeline in in ios which is
37:14 behind any any user dashboard
37:16 irrespective of how this looks
37:19 whether it's a gauge or you know whether
37:21 it's you know tabular or or histograms
37:23 or whatever
37:24 so an
37:26 analytic pipeline is composed of those
37:28 three components each one of them is
37:29 customizable so you have the telemetry
37:31 collectors which is an sdk
37:34 we
37:34 give customers to uh and
37:37 enrich the the data set or the the data
37:40 that's collected from the switches um
37:43 you know go beyond the built-in
37:45 telemetry collectors that we have so
37:47 this is really the ability for you to
37:48 extend what's called the road data
37:50 collection like anything that comes out
37:52 of the switch you find a data structure
37:54 and how you want to stream this data
37:57 into the telemetry framework so this
37:59 part is as i said customizable and then
38:02 once data comes to the aos server you
38:04 can then create a user define a highly
38:06 customizable pipeline in blue
38:09 um
38:10 where you can select various processors
38:12 the processor makes a specific uh you
38:14 know
38:16 manipulation or correlation or reduction
38:18 or processing of this data so the
38:20 selection of those processes and the way
38:22 you change them together defines an
38:24 operational workflow and and there is
38:26 like you know list of different
38:28 processors and the use cases are
38:31 really wide because you can really
38:33 combine them in very different ways to
38:34 basically i don't know compare the data
38:37 time series it
38:38 do periodical range and even more
38:41 complex calculations right and then
38:44 you see the red part in the top so all
38:46 of this this pipeline is always in sync
38:48 with your source of truth so you you
38:50 define a query to the source of truth
38:52 which defines the scope of application
38:54 of this pipeline
38:55 and then it's a zero maintenance uh you
38:58 know feature like any time
39:01 you add move and change anything from
39:03 physical logical standpoint the fabric
39:05 you don't have to notify or change or
39:07 maintain this this analytics pipeline
39:09 you have automatic uh notification
39:12 and increasing or reducing the data set
39:15 to collect new data because it's subject
39:17 to analysis here again singles also true
39:19 for configuration and rendering and uh
39:22 and validation right
39:23 um and then last point and then i think
39:26 i would probably be
39:28 almost on time
39:30 last point is that
39:32 we um
39:35 create automatically apis
39:38 for
39:39 uh
39:40 any pipeline that that that you you
39:43 create right of course we have apis for
39:45 all the configuration part right that
39:46 that's uh certainly um you know uh
39:49 mandatory uh but uh any any user
39:51 pipeline any analytics powerpoint that
39:53 you create for any use case
39:55 you have api endpoint that allows you
39:59 to uh get
40:00 to any stage meaning to any processor
40:04 which is which is um you know available
40:07 for you to have third-party systems
40:09 created the data being raw which is on
40:11 the far left or being processed in the
40:13 middle
40:14 or being totally analyzed which is
40:17 closer to the far right
40:19 where you can extract more insight out
40:21 of it
40:22 so that's one method to obtain the data
40:24 and we also have the ability to uh
40:27 stream this data to external data lakes
40:29 um
40:30 with um
40:32 google protocol buffers so um
40:35 so all data being collected that the the
40:37 ultra server is aggregated normalized
40:40 and so on and enrich with metadata as
40:43 well so that you have insisted
40:44 experience and then we stream out the
40:46 data from the ultra server two-way to am
40:50 a google protocol but for um television
40:52 collectors so you know we typically have
40:54 plugins for telegraph because we we like
40:56 this modular tune and customers you know
40:58 can then um
41:00 you know select any time series that are
41:01 best to to write data back um so you can
41:05 basically have practice to um use apis
41:07 to program the extra server to collect
41:09 or parse new data in specific formats
41:12 you are interested with and you have
41:14 them this ultra server that can
41:16 stream the data uh um to this uh you
41:20 know data like external stack that that
41:23 you have out there uh so on these are
41:25 just examples of of very common uh tick
41:27 stack or or else stack that that we we
41:30 interface with
41:31 um
41:33 so um to recap
41:36 we have um
41:38 a solution that allows to
41:40 configure and operate um
41:43 with uh very powerful um and
41:47 and um
41:49 but powerful customizable analytics
41:51 engines so powerful in terms of
41:53 it's it's automatically available for
41:55 you um
41:56 and uh and and it's of course highly
41:58 extensible to to fit any customization
42:02 so which allows us to address you know
42:03 day zero day one but day two which is
42:05 the major challenge in network
42:07 operations it's a more uh difficult
42:09 problem to solve
42:10 um and um and of course allowing you of
42:13 course to enforce any compliance policy
42:17 uh that you want on top of of um
42:20 of the built-in validations that that we
42:22 have
42:24 um if you are interested we um have um
42:28 different resources so um
42:30 there's a youtube channel
42:32 where we uh explain in uh
42:35 five to ten minutes different uh part of
42:37 the product from from the design to the
42:39 validations and so on so uh i encourage
42:42 you uh
42:43 to to to have a look if you want to get
42:46 uh get a glimpse of how the product um
42:49 is is uh is operating if you want to
42:51 spend more time we have ultra academy
42:53 which is a self-service training uh
42:57 tool so it's uh it's um
43:00 something that is a full day training
43:02 but you can you can basically do it at
43:04 your own pace but it's it's uh it's
43:06 worth of it's three days worth of of
43:09 instructor-led training um with with the
43:13 various modules broken down with some
43:15 form of deep dive explanation there
43:18 um and and you have the virtual labs
43:20 that you can use to um stand up virtual
43:23 environment uh we will basically uh
43:26 create small topologies generally two by
43:28 four or something but really enough for
43:31 you to uh
43:33 uh at least appreciate the major uh part
43:35 of the or the major features uh we
43:37 generally send up this with virtual
43:39 switches like literal qfx's or virtual
43:41 order losses but the user experience
43:43 from from the actual perspective is
43:45 exactly the same
43:46 whether it's a physical switch or a
43:48 virtual compactor
43:54 [Music]
43:58 you