Jussi Kiviniemi, Global Technologist, Juniper Networks

AIOps for Networks: The Impact of AI on the Future of IT Networking Operations

Industry Voices AI & ML

Four boxes showing video feeds from the panel participants. Going clockwise starting in the upper left is Marcel Hild, followed by Jussi Kiviniemi, Ed Horley, and Ed Henry.

How to seize this opportunity to strengthen network operations.

Artificial Intelligence for IT networking operations, or AIOps, is finally here. But how will AIOps impact the daily lives of network engineers and IT networking practitioners? Find answers from this roundtable discussion featuring experts from four different organizations. Tip: Watch the demo of Juniper's Marvis Virtual Network Assistant.

2:56 Will AI take away my job?

8:00 What is AI? How is machine learning different from traditional programming?

12:25 What are forms of intelligence?

16:00 Understanding and assessing AI systems

18:22 Understanding data, human priors, and representations

21:30 Training AI

27:00 Applying learnings from computer vision to networking

28:00 A grand theory of networking

30:45 Breaking with traditional networking approaches

32:00 Modeling the network

33:00 Demo of Juniper's Marvis Virtual Network Assistant

34:10 Other tangible examples of AIOps

37:25 Starting the journey of AIOps

41:45 Building a data sciences practice

55:55 Recommendations for resources

1:02:00 Where do we go next with AIOps?

You’ll learn

What is AI, and how machine learning differs from traditional programming
How to apply lessons from computer vision to networking
Resource recommendations, and what’s on the near horizon

Who is this for?

Network Professionals Security Professionals

Host

Jussi Kiviniemi

Global Technologist, Juniper Networks

Guest speakers

Ed Henry

Sr. Scientist and Distinguished Member of the Technical Staff for Machine Learning, Dell Technologies

Ed Horley

Co-Founder and CEO, Hexabuild

Marcel Hild

Manager, Software Engineering, AI CoE, Office of the CTO, Red Hat

Resources

Transcript

0:00 hello good morning good evening good afternoon everybody and thank you for joining the ai ops in networking panel

0:08 discussion um my name is yusi and i'm truly honored uh to be joined by this

0:15 distinguished guests uh today so today we're gonna

0:20 talk about ai uh machine learning and its impact uh especially to to our

0:27 networking uh the network engineers of us the network managers

0:32 so without further ado let's do a round of introductions of our of our panelists

0:38 ed horly why don't you start oh hi everyone

0:44 uh ed worley i'm the ceo co-founder of hexa build and uh we really focus on on

0:49 ipv6 as a transition but obviously automation is is key network automation is key for what we're doing because we

0:56 think that's the direction cloud networking iot is the direction that things are going and that's obviously an area of

1:02 impact for ipv6 and things around ai machine learning seem to be cropping

1:08 up more and more around around those subject areas so it's super important for us to sort of have a good understanding of what's happening there

1:14 so that's that's me been doing i.t and operations for 20 plus years

1:19 thank you for joining ed uh then uh so so that we have enough confusion for it

1:26 for the host of the panel we have another ed uh ed henry uh how are you man thank you for joining

1:32 i'm doing good man thanks for having me uh ed henry i work in the office of the cti or the ctio office at dell

1:38 technologies and i work on anything and everything machine learning related that can span

1:43 the gamut from robotics and computer vision through to infrastructure automation like this conversation will

1:48 be about i'm happy to be here excited to talk a little bit about where this kind of

1:53 technology is headed thanks for having me man of course thank you so much for coming

1:59 and uh marcel healed uh good guten habens begets yeah good good numbed i'm good

2:07 i'm i'm fine so it's 6 p.m here out of germany and i'm also

2:13 working in the office of the cto this time red hat so one layer up the stack and we try to make sure that ai as a

2:20 workload works great on our platforms but also i'm looking specifically into

2:27 how can we use all the metrics and locks being produced by all those machines to

2:32 push forward to that vision of the self cluster somehow and i've been doing that

2:37 for the last three years and i'm happy to share my challenges concerns and what lays ahead

2:46 sounds good thank you so much marcel for joining as well uh so so speak about challenges uh let's just

2:54 start with the elephant in the room so uh is is this gonna be happening like um am

3:01 i gonna be losing my job is my job going to drastically change as a network engineer as an as an i.t ops person

3:08 how's this all going to pan out what what do you guys think

3:15 yes why don't you of course of course you will lose your job at

3:21 least i mean your job will be different than it was before it's it's pretty much like um people were riding

3:29 horses caps a century ago but we still have people riding caps or

3:37 driving caps and the same will be true for operation cycle you won't be the person um

3:44 i don't know um like doing your job as you did it before but

3:50 you will be uh assisted by some some tooling somehow so i think it will change a little bit but you certainly

3:56 will not lose your job i think we will need more humans guiding those

4:02 in the end we we need to train them somehow

4:08 absolutely and and ed uh eduardly what are some of the examples like what

4:15 kind of jobs are are on the line first uh and especially if we focus on the on

4:20 the i.t side of things or or outside of or outside of it what do you think

4:26 i think i think it's i don't think there's going to be jobs on the line i think the the definition of what your role is as a job is going to change to

4:33 marcel's point it's it's going to be that you're going to focus on the things that are actually of high impact within

4:40 the business because we can take a set of of chores and duties that maybe were very repetitive

4:45 maybe a little grinding a little a little um you know something something that could

4:51 be stamped out in a more automated way and and we're going to shift those sets of workloads off onto computer systems

4:58 as opposed to something that we have to run and maintain and build you know a set of workload and flow and yeah this

5:03 is a great great slide from from jason edelman talking about that our behaviors

5:09 really haven't changed for you know for well over a decade maybe even two decades in terms of how we we do

5:14 specifically in networking how we do operations and i think we're starting to see the evolution of of investments by

5:21 the major you know network players to really change how we think about building deploying and operating

5:27 networks and and to this point maybe we can have a quick conversation about about how automation's really changing

5:33 that and then how ai is actually changing that portion so i i someone you know tell that to ssh isn't

5:39 a big elevation and maybe in security but that's about it so i think i think the points well taken

5:45 there very good uh it henry is there like

5:50 some other like fears or misconceptions around ai besides this uh you know i'm

5:55 gonna lose my job tomorrow any any other that come to mind um

6:01 nothing specific but i guess in the well there's a lot but in the realm of like losing your job people should

6:08 probably understand that machine learning is really just kind of a more robust way to automate

6:14 and robust being a function of situations generalization across situations in which somebody needs to

6:20 make a decision right and the reason why this technology is becoming pretty popular is because

6:26 humans exist on the perceptual plane right like we we see the world we hear the world we interact with the world

6:32 through our perceptual processes and we interact with the systems that we create through those perceptual processes so

6:38 you see a lot of advances in the perceptual space of machine learning uh headed toward being applied to the

6:45 way that we perceive our systems like we see today right um so i wouldn't necessarily be a i do

6:52 like the idea of you know what marcel and ed had outlined in that your job will be different

6:57 but it's just yet another layer of automation and it's kind of needed it's you know the

7:03 amount of connections uh with respect to devices that exist uh on the internet today is is a function of

7:10 of the amount that humans can support today right so in order for us to be able to kind of grow that number of connected devices we need to think about

7:17 how we do this thing called management a little bit differently yeah and our our jobs have been evolving

7:24 like obviously for centuries anyway with tractors and you know then computers and and what

7:30 have you so so this isn't that much of a dramatic shift uh at the best and and

7:36 very likely is the shift from kind of like for network operators for example like i think from firefighter more to an

7:43 architect type of a role and like you said from repetition into like more more interesting tasks

7:50 then if we if we quickly uh then move from the big elephant in the room into

7:56 like a little bit defining what is ai and machine learning and and

8:01 you know how is machine learning different from a traditional computer uh program how is the machine learning

8:08 program different anybody want to take a quick stab at uh kind of clearing that out

8:14 yeah i can jump on that one if you'd like initially and then everybody else can throw something out there as well um so

8:21 i tend to not be categorical in thinking with respect to what is artificial intelligence

8:28 i don't think we know what intelligence is yet sufficiently enough to even wrap a definition around it but we do know

8:34 what you know learning roughly learning looks like so from a machine learning perspective it's this idea of being able to generalize

8:41 right so given an input into the system can i make a decision effectively in

8:46 understanding what that input is or is not or what it should or should not be or have i have i seen this thing before

8:53 or not so if we were to liken this to say something in the computer vision realm uh given all of these pixels that i see

9:00 in this particular image what is in this image right if i were to imagine writing an if-then

9:05 statement for exactly right if i were to imagine writing an if-then statement for all of the images of dogs and muffins or

9:13 chihuahuas and muffins like in this example that would be infinitely long given the

9:18 input dimensionality of the particular image right more traditional programming is inverse

9:24 to that where i would write all of those if then statements i would develop an algorithm to be able to solve this particular problem this concept of

9:30 generalization and machine learning can kind of be interpreted as finding that algorithm as opposed to encoding it by

9:37 hand [Music]

9:42 data is becoming also part of the program so before you would have

9:48 your fixed set of inputs data then you would write your program which is bound to that inputs

9:54 data and then you would have run it and obviously you would have ci tooling and gating mechanisms to make

10:00 sure that your program is still valid depending on the data that you designed your program to

10:07 now with machine learning since the program itself is so much based on the

10:12 data the data itself is also becoming part of the program so you will have to rethink how you would put

10:19 a program into production because if your data changes you would ultimately also have to adjust

10:25 how your model or how your deployed machine learning thing

10:31 is is is acting so things like data drift are becoming more and more

10:36 important than in the previous world where you have had input machine output

10:44 and that also makes uh qa for example uh a bit more challenging right right you

10:50 don't have a linear right answer anymore every time that the right answer will always stay the same am i right

10:59 yeah i think i think i think that assessment's good i think the other part at least for me when i when people talk

11:04 about ai in general terms i just try and correlate that to let's try and mimic something around

11:09 human behavior right and that's a wide breadth of of topic within itself but i think the attempt to

11:17 try and make computers mimic any sort of of of related human behavior

11:22 and humans are pretty adaptable looking at different data sets and building correlations between them we're good at looking at you know

11:29 taking different modalities of data right we can take audio data and visual data and put correlations together between them which

11:35 is a very unusual thing to do and so teaching computers how to do that and mimic that sets of

11:41 behaviors i think it's sort of what as a general bucket i can sort of i sort of consider ai and then machine learning is

11:47 what the sets of datas and algorithms you can you can place in there and i don't know if that's accurate or not but that's sort of how i look at the world

11:54 i'm interested to hear what ed henry has to say about whether i'm accurate on that or not but but that's sort of how i

11:59 see the general journal bucket i think that that's that's actually also the powerful thing right so not not just

12:07 that we humans are really adaptable and prone

12:22 so i'll jump in it looks like maybe marcel's having some connectivity issues um when i mentioned earlier i wouldn't

12:28 necessarily be categorical about this idea of intelligence um what i mean is we tend to use ourselves as a ruler kind

12:34 of arrogantly but we have forms of intelligence that exist all around us right like in animals and even in

12:40 patterns that aim on the animals seem to exhibit like flocking patterns and birds and things along those lines right so i

12:46 agree um but from the perspective of like creating a larger general intelligence i i'm not convinced we

12:53 understand enough about what that even means yet to head down that path so at least we can

13:01 we can agree on the like uh ai something probably adaptive and autonomous at

13:07 least sorry sorry go ahead well and and that's why i put it in the

13:12 category of sort of behavior maybe human behavior isn't the right thing to ed's point that there's animal species

13:17 there's you know bacteria there's other things that have a sets of behaviors that definitely are

13:23 you know could be argued have some sort of look like intelligence

13:28 that's a general bucket to put it in but that's why i say it's a set of behaviors that we're able to discreet around and i

13:34 think that's the important thing is that that's to me is the general bucket and then you start getting into as a slide

13:39 showing you know you know if you if you use that then you can start breaking down what you sort of think about around the data science the deep learning the

13:46 machine learning the the other areas that sort of fit into that category but i think they're all there their own goal

13:51 is to try and mimic our sets of behaviors right our sets of of outputs about how we interpret things

13:57 we're trying to rationalize that with computers that can do something similar to what we're up to it doesn't mean

14:02 they're right all the time and it doesn't mean we're right all the time either right but i think but i

14:08 think it's important that that's what we're attempting to emulate within the within the environment so it's that's

14:13 what we're trying to gain insight around so i think if if you take that viewpoint then you sort of get to what the goal is

14:18 which is the goal is we're seeing a set of behaviors and as humans we normally do x maybe it makes more sense to do y

14:25 here's a here's an example of how that's a better output or you know we normally do this and in

14:30 itops this is a really good example security breaches and how do you detect anomalies within your network

14:36 and to ed henry's earlier point when you have so many devices it becomes so large it's very simple when you have a discrete small number that you can

14:42 measure on two hands right like i got 10 devices it's pretty easy for me to look through the logs i probably know all the ip addresses for all those device sets

14:49 and then you start putting a lot of extra zeros after that for the total number of devices it becomes a

14:54 much bigger just data ingestion problems so that's the big data side and then what's interesting out of that data

15:00 becomes a much bigger problem and so that's that's where computers can really help us to get better insights and

15:06 bubble that information up now are you having a human still make decisions around that

15:13 well sure that probably makes the most sense but you want a fast way to bubble that data up to them and that's my

15:19 specific example around networking i mean there's obviously tons of other examples in other areas but i think those are some of the things that are

15:25 very that are changing in the landscape for folks in terms of how they operate today yeah and i'll agree with ed um

15:33 i hope you can near me again sorry to interrupt you oh you're back yeah i think the whole european on 24

15:40 went down for a bit but many people doing homeschooling here

15:49 i hope it's homeschooling so sorry ed ed henry you were uh

15:56 horribly we're about to say uh yeah no i i think i think this is the point at henry's original point around

16:03 around sort of that behavior side it's it's what's the interesting sets of data that in that indicate a set of behaviors

16:09 that are important for for folks and how do you bubble that up when the data sets get bigger and bigger

16:14 and bigger and i think the challenges for us as operators is traditionally we've run relatively discrete smaller

16:21 you know networks and systems and as you start looking into what the cloud operators are doing when you start looking at what larger fortune

16:28 enterprises are doing it it's not sustainable to try and hire that many people to run that big of an

16:34 organization and get enough value out of it but you still have risks that are associated with it so you want to you

16:40 want to be able to prioritize and and provide the greatest value for you know risk mitigation and that's

16:47 things around security that's things around misconfiguration that's things around you know just adaptability of

16:53 being able to solve business problems and making sure your application teams can deliver what they need to and those those operation principles at scale

17:01 become very hard problems to solve and this is really the area where i think you're starting to see a lot of value being added to the

17:07 ecosystem about saying how do we you know at henry's earlier point how do we make sure that

17:13 automation is delivering what we expect that we're getting the outcomes that we expect and that we're getting the consistency and the supportability and

17:20 all the other things that are important for wine and business owners right and no one cares if your business can't run

17:26 because you can't get your app to run no one cares about how well your system is actually behaving at any given point in

17:31 time because it doesn't solve your customer problem and so understanding those systems and making sure that stuff

17:36 is taken care of on the underlay is really important and that's where i think this first set of ai ops is really

17:42 making a difference for people because that means that a small investment in a team that's well

17:48 well built and understands how to use these tools is going to outperform a much larger team who does not have the

17:54 insights and availability of these tooling sets to do the same work at least that's that's what it appears

18:00 to be in terms of what we're seeing from the cloud operators versus like the judicial folks building infrastructure

18:06 sure and and when you align you know a lot of what ed just said back to you know what is machine learning and we can even talk

18:12 about from the like it and ai op space or whatever moniker acronym you want to assign to it

18:17 when we head down the path of applying machine learning a lot of the time for the first thing people reach for

18:22 are things off the shelf out of the perceptual realm right so we look at like you had a slide up a minute ago on

18:29 deep neural networks right like the the advances that have been made there are interesting in the perceptual space and

18:34 they're starting to be uh applied as kind of like a tool in other areas of science but the important thing to

18:40 capture there in understanding how or why is this working and why now

18:45 is that we're capturing human priors in the data sets that we're using to train these models so to ed's point and what

18:51 does capturing a human prior mean right so in the realm of like it ops it's understanding

18:56 how you have this human that's been running this giant system of devices that has integrated all of this

19:02 information over many different time steps right so inside of our heads we all have maybe a rough diagram

19:09 of what this particular piece of infrastructure looks like how it's interconnected all of the relationships that exist between the different

19:14 components of this system right decomposing that into a representation

19:20 that works for machine learning is a tremendously hard problem and the reason why i say that is because we have no way

19:26 for me to kind of download that internal representation that you have inside of your head so what do we do we have to talk to

19:33 humans in order to understand how they're even building this internal representation of their head and then understand how they're using the outputs

19:39 of that system to build that internal representation right whether it be configuration information or whether it

19:45 be operational statistics or whether it be you know the fact that somebody in an executive board room said that that's

19:51 the way i want it to look and i'll go do it the reality becomes when we apply machine learning to these

19:58 problems it's incredibly important that we figure out what it means to capture those human priors

20:03 in a way that is sufficiently representative of enough of the problem we're trying to solve

20:09 and you can do that in a lot of different ways deep neural networks is one way but one of the other things that i think the academic side of the world

20:15 and the realm of machine learning and artificial intelligence generally has started to realize is deep learning still doesn't solve a lot of the

20:21 fundamental problems that exist and what i mean by that is like you guys had mentioned earlier the world changes around us right you have this

20:28 distributional shift that happens constantly you and i get on an airplane six hours later we're in another part of the country the world looks entirely

20:34 different right or at least in jesse and marcel's side where you're in a whole different country right

20:40 but the world looks entirely different that's a distribution shift on the pixels or at least the inputs photons

20:45 that your eyes are receiving right that out of distribution generalization is an unsolved problem so we have to contin

20:51 this is the argument that i make a lot of time on will your job go away as an operator i don't think so because we constantly have to query you we

20:57 constantly have to ask you what should we do in this particular situation now can we capture a sufficient representation enough to run

21:03 infrastructure maybe i don't know but in the world of machine learning things are empirically driven i need to be able to

21:09 go and prove that empirically to determine whether or not my model performs better than you as a human

21:14 right so i'll stop ranting now but anyway i'll open the floor for anybody else who wants to jump in

21:21 yeah that's that's really really true and when it comes to like the model and how how the ml has

21:28 been trained uh it all comes down to like how widely it has been trained right like as as in this picture for

21:34 example um you know this guy puts his uh wi-fi access points in the freezer and

21:40 in the dumpster to train the extreme situation so let's talk about that uh training and and data uh for for a while

21:47 so so um how important is is like an unbiased data set in training and what

21:54 what kind of extremes do you need to account for and on the other hand uh like what kind of extremes do you need

22:00 to avoid with the data what kind of caveats i guess i'm asking is there with training and the type and amount of data

22:08 there there is we know that you know the better the data is uh the better the ai is and if there's you know crappy data

22:14 then the ai also is very flawed what are your thoughts on that

22:20 yeah so i think that that's that's true especially with the bias data so if you have some

22:27 data that is just full of lies then you can only treat a model that is producing

22:35 lies but i mean that's super super important when it comes to

22:41 decisions where actual human life is impacted such as models that

22:50 may guide a police police thing

22:56 or a um for in jurisdiction right um so that that's super super critical and we

23:01 and there are a lot of research going on with regards to detecting bias in in models or when it

23:08 comes to face detection or something like this so we all saw these um things where where the model is um

23:16 really bad at detecting black faces because it's trained on primarily um white faces

23:22 but when it comes to i.t data and i think that's what more of the folks here are uh concerned with whether your

23:30 machine is being rebooted because the model thinks it's a bad machine there's nothing really there's not such a real

23:36 impact right so that's that's completely okay i think here it's more towards do

23:42 we have to train model every time on every data center

23:47 and learn from scratch again and over and over and over again because i that that's that's what's currently happening

23:54 in most of the tools that are applied to i.t problems so i can buy something off

24:00 the shelf that gives me um linear regression or um anomaly detection or

24:06 event correlation but i always have to train it on my data set so i i haven't

24:11 heard of a product i mean there are probably some products there but um like a

24:17 kubernetes cluster that ships with pre-trained models on the um on the on the outages that are usually

24:24 seen with that cluster if i step even one step back and say a database let's

24:29 let's imagine postgresql which ship with models that detect misconfiguration or

24:35 detect outages before they actually happen because we

24:40 trained some baseline models on data that has been running from the

24:47 community or from from from other folks so i think that's that's also an important aspect when it comes to data

24:54 because ultimately what you're saying is true a model can only be so good as the data as it's been trained on

25:02 so uh then those that have uh access to most

25:08 of the data for example in networking will have a huge advantage it will not only be their existing software and the

25:15 domain experts there but also you know data will play

25:20 like obviously a bigger and bigger role uh in companies having a competitive advantage

25:26 and and then like what are the efforts in democratizing uh all of this data today these kind of events are like uh

25:34 these kind of initiatives are happening but they seem to be pretty small compared to you know these closed uh

25:40 corporations that's just harvesting tons of data

25:46 so i'll jump in a little bit um from the machine learning side of the world uh and again in the research side

25:53 of everything i mean arguably a lot of the uh advances that

25:58 were made were kind of powered by the data sets that were were collected so when you look at

26:04 things like the imagenet datasets a couple tens of millions of images that are labeled right

26:11 that data set was key to kind of unlocking the i guess

26:17 old approaches for some value of ahold in in the world of machine learning and

26:22 scaling them to be superhuman for some value of superhuman and the perspective of performance right

26:29 when it comes to networking and infrastructure in general those data sets frankly they don't exist

26:35 they don't exist in the open science space they don't exist in the open source space so i think that's why we see

26:42 a lot of kind of just firing from the hip in the world of machine learning and seeing

26:47 what works i think what will happen over the next probably five to ten years i don't know i'm probably going to get shot for

26:54 throwing an actual number out there but over the next n amount of years right um we will probably start to realize that

27:00 in the world of networking we have to do a lot of things much similar to other

27:05 fields like computer vision or natural language processing or speech synthesis or or the perceptual realms generally if

27:12 there are problems that we are trying to solve figuring out how to collect a data set that represents that problem is

27:18 imperative for this whole process to kind of take off and work right um so i think the other thing that's

27:24 kind of lacking there especially in the world of networks is there's no like kind of grand theory of networking and

27:30 what i mean by that is in the world of say computer vision we have a rough approximation of how like your retina

27:36 works and we've baked that into things like convnets right convolutional neural networks um the idea of these

27:41 convolutional kernels that your filters that exist inside of the layers of the network are a rough approximation to how

27:47 your retinal works and actually proceeding back into the visual cortices of your brain but the idea is we don't necessarily have a

27:54 great [Music] grand theory of networking yet i think there are a lot of people who

27:59 are headed down the path of what does this look like if we were to take like graph theory and apply it to infrastructure generally

28:05 but again a lot of the progress that we all hope for will come from open source and open science i think that that's

28:12 really kind of the only way forward and when it comes to larger organizations you may see blog posts and research

28:18 papers published and generally ideas that are thrown out there that

28:24 have some sort of state-of-the-art performance on a given problem you'll notice the one thing they don't publish is their data set

28:29 and i think we even talked about this in our pre-meet up right the idea is that data set actually

28:34 captures the representation of the problem you're trying to solve and it's your edge right it's how you are able to

28:39 differentiate with respect to what potential problem or or methodology in

28:44 the world of machine learning can apply to this problem right yeah i think i think there's some big

28:51 changes or evolution change that's happened in the last you know five years or so that's changed the

28:56 industry in terms of capabilities to be able to do this before you had to write on-premise software that did to marcel's

29:03 point did all the data aggregation collected all the data and then tried to provide some insights based off of your limited data set of how you operate your

29:10 network and hopefully your network was big enough something interesting was actually happening that provided some multiplier effect to

29:16 sort of reduce workload or gate or provide the the ops team some better insights about what's going on versus

29:22 what they already knew the change today is that with cloud first principles around networking

29:28 technologies and being able to aggregate that in cloud and having cloud operations happen and streaming and providing that common

29:34 data set across thousands and thousands of customers you're able to gain a set of data

29:39 insight and to at henry's point yes it's it's that specific vendors capability to

29:45 provide insight but they're getting a much bigger aggregate of data and suddenly they're able to provide better insights for those discreet customers

29:52 they're pushing that insight into their capabilities of their products to provide that insight but the community

29:58 overall is is able to then take advantage of those insights and capabilities that they're building into the product based off of a larger data

30:04 set now do we all need to open up all data sets in order to make push things forward

30:10 there's you know the altruistic side of saying like yes that should be where the industry tries to strive for

30:16 but as we've seen with the ietf and as we've seen you know the sort of the the collaboration side of networking we're

30:22 sort of vacillating between all opened all closed all open and and there's got to be a happy medium to allow companies

30:29 and entities to exist and have a product to build that and also the data sets that provide greater insight i think the

30:35 funny part is is that we've codified everything in at least today in in standard sort of config management tools so to marcel's

30:42 point earlier right the way that we define things and the way we find the human learning is just to say like we don't understand

30:48 what's going on with that compute unit just blow it away and restamp it because that's the easy thing to do um that may

30:54 or may not necessarily be the answer that's appropriate for that particular device but it certainly is a way to

31:00 solve the problem for how we deal with things today maybe with better insight later on we'll understand how to tweak systems to to to

31:07 make that better but that's the quick and easy way that we could to find that right you run your ansible playbook

31:14 things back down again and say get rid of that system uh let's get the behavior that we turn it on and off again

31:20 yeah well yeah not quite the you know not quite as but you know have you turned it on turn around again yeah but

31:25 i think i think what it is is we see a set of behaviors that we don't like it's just go replace the thing so it doesn't

31:31 do the bad behavior anymore as opposed to understanding where the behavior sets come from and i think that's the next

31:36 innovations that we're trying to get to is understanding why maybe the network is doing what it's doing as opposed to

31:42 saying like we just don't want to see that sets of behavior so let's just go fix those sets of behaviors

31:47 all right so there is a connection yeah go ahead it's uh no no go ahead uh mr

31:54 henry oh i was just gonna say and that that brings me to you know the representation of the network right

32:01 like we have protocols that exist like syslog we have protocols that exist like snmp we have protocols that exist that

32:09 allow us to query the current operational and configuration state of infrastructure generally right

32:14 but i'm not convinced from the machine learning side of the world that that's the right representation and what i mean by that is like that's the right input

32:20 to said machine learning model i think what we're going to learn over the next couple of years as well is that we can probably start to

32:27 rethink how we present that information because when you look at things like syslog and snmp and all these other

32:32 protocols and even if you head down the path and networking things like netconf and yang it's a representation that's

32:38 meaningful and useful as input to the perceptual system of a human right

32:43 yeah where in reality we could probably come up with different representations that are more meaningful for some value of meaningful for capturing information

32:49 with respect to the underlying system that we're measuring i.e networks or infrastructure

32:55 absolutely absolutely and and if we uh that's actually a very really good point and while while you were talking i

33:01 apologize for missing the faces but i was showing some of the things that uh juniper mist for example is is doing in

33:08 this space so uh just as an example we have uh conversational interfaces we we have a

33:15 query language and but and and kind of this like google type of networking thing where

33:21 you know you can ask how how is ed henry doing and then it will kind of break down your connectivity into parts and

33:28 say say is there significant problems where are they and you know what are they correlating with and all that kind of

33:34 stuff uh and then there's the automatic kind of analysis of pro problems identification and analysis of problems

33:41 what do you guys today see as as some of the tangible examples of ai uh where

33:47 where you know what people should at least be doing or email uh in networking i mean

33:53 uh what where can people where are the low-hanging fruits uh for i.t or networking people uh

34:02 you know what are some of the tangible examples for you guys

34:08 i think i can show you so i've been looking a lot into time series metrics

34:13 since i'm working mostly on kubernetes clusters and if you install a kubernetes cluster you will end up with um 10 000

34:20 to 2000 time series um at least and it's super super tough to come up

34:28 with meaningful thresholds um in yeah getting getting on top of all these

34:35 time series i mean you can you can hard code things but

34:41 at just looking at the vast amount and using some of the

34:46 exploratory data analysis tools that data scientists have like

34:52 just plot all the 1000 time series and find out which are correlating which

34:57 have the most fluctuation which have common labels that's that's basic that's

35:03 the hello world of an alpha data size and using those tools and point them at your data set i think

35:10 that's that's the first thing that you can do and that's something that you can easily do within a month so you

35:17 spin up youtube just look at how jupyter notebooks work and then how you deal

35:22 with time series and then you have prometheus collecting all these time series and then connect them that should

35:28 be pretty straightforward engineering task and then the then the next step would be let's take

35:34 one really simple time series that i'm failing to come up with sensible

35:39 thresholds and just do some linear regression or some more advanced

35:44 prediction stuff so that i can be notified on anomalies because normally

35:51 in the end it's nothing else then this thing deviates from the predicted

35:56 value and even if you look at prometheus it has um it has two built-in

36:03 tools or functions to do a projection into the future i think

36:09 it's a linear regression and the other one i'm i'm blanking on now but even

36:14 applying these i think that's that's super low hanging fruit and that that will get you started with um applying

36:19 the aiml tooling to your domain

36:28 definitely a good uh good starting point what about one of the ads uh any any

36:34 you know examples of uh how other examples of how a network engineer

36:39 uh besides purchasing juniper uh networks or or switching or or wi-fi and

36:45 any other uh good places to start and what marcel was saying is is like uh

36:52 also requires some some of course you know uh development skills some some data science skills as well uh is there

36:58 any where is the lowest hanging fruit where should people get started on the ai journey anyway uh especially networking

37:05 people i mean let's say you know you already know some scripting and some python what's next

37:13 probably for me on the opera just because i deal with a lot of customers in the in sort of the fortune 500 and

37:18 the operator role so so my ben's a little bit there so i'll just put that out there as a caveat i think many

37:25 people are starting down this journey around logging today already so the logging data that they're trying to get better insight around i think most of

37:32 the major networking manufacturers are providing solutions around how to get better data insights out of that so that's a great super easy lift start

37:39 location for them to go the other funny one is that i i i'm amazed how many network engineers are

37:45 unwilling to get out of their seat wander down the hallway go talk to their app dev folks who probably already have

37:50 prometheus kafka and a ton of other uh complex uh data ingestion systems

37:56 already in place and say like can i ship you a portion of my log files and can we start playing around can you start

38:01 helping me to look into the data sets that i already have that i'm producing it's

38:06 really good can you guys take a portion of that i don't need long-term storage i just want you guys to start showing me how you guys are using this for the

38:13 business that we're running today can you show me how to use and start maintaining and running some of these tools that are already existing probably

38:19 running within your environment and that's the fascinating part for me is it's just the lack of communication the lack of of being willing to explore

38:26 and wander down the hallway to the probably a team that is doing this sort of work and saying can you guys provide

38:32 me some some baseline some tooling some collaboration together so that i can get some some information even out of our

38:37 own systems they may tell you to go run away or go read this book but that's okay go do it and then come back to them

38:43 after you invest a little bit of time to understand what they're dealing with

38:48 you can do the reverse to them you can say you should invest a little bit of under time understanding what impacts you guys are having on my network

38:54 because that's a legitimate conversation too but it does allow you to have a much better collaborative conversation with the folks that are actually trying to

39:00 solve a bunch of these problems within major organizations it's something that you really should be uh

39:06 involved with i don't know you know ed henry if anyone's actually approached you in that method of saying like hey we'd like to get better data

39:12 insights in our operation side never mind what you guys are trying to solve from the business angle side but what better what better things can we have in

39:19 our op side to give us that capability i don't know if that's been an experience you've had or not

39:25 yeah it is uh and interestingly enough the operations usually maps back to money somehow so people are usually interested

39:32 in figuring out how they can enhance that experience um i would approach that question of like how can people get

39:37 involved from a few different angles i think the first angle i would come at would be you know i outlined earlier that as somebody who's working in

39:43 research and engineering in the world of machine learning i'm trying to capture the prior that is you as a person right

39:49 i'm trying to capture the prior that is the internal representation that you have of what your infrastructure is and what good or bad means with respect to

39:55 that infrastructure so just start to think about the questions that you would like answered

40:01 start to think about the problems that you approach every day that are pain points that you haven't been able to

40:07 quite automate around yet um and then second would or at least two of n would be there are ways that we've kind of

40:14 rallied around this in the world of infrastructure and networking in the past i know there's a repository if it still exists or not i don't know on the

40:19 internet of pcap files where everybody was just posting representations of say protocol exchanges inside of pcapp files

40:26 so people can start to download who are interested those pcap files and understand what the connection semantics

40:31 for that particular protocol looks like right the same thing could potentially apply in the world of infrastructure maybe

40:37 don't give us a capture of your your specific production infrastructure but if you have something that looks and

40:43 feels a little bit like it from a lab perspective you know that this particular infrastructure exhibits these

40:48 particular properties of this problem that it is that you have feel free to publish that right that's an example of an open data set

40:54 aligned with a particular problem it is that you're trying to solve and then lastly i will highlight exactly

41:00 what ed said as somebody who practices in this space the first thing that i do when somebody asks me a question around

41:05 can you solve this problem with data science or machine learning or whatever it is that you want to call it i find out the people i try to find the people

41:11 who are the ones that deal with the problem every day because they're the ones who have what is considered

41:17 domain expertise right so they're the ones who deal with the representation of infrastructure that they're that they're

41:23 running all of their important business applications on i'm never going to understand it as well as they do um so

41:29 figuring out again how to work with them is something that is imperative to me and i encourage everybody in an

41:35 operational capacity who maybe has a data science team inside of your organization to find them

41:40 figure out if there's a way to ed's point that you can feed them some ops data right

41:45 it's it sounds like uh data science these data scientists are coming become

41:51 becoming almost like the most inner circle in in problem solving and then people start hovering around data

41:58 scientists to get this problem solved and this is similar to how we operate at nist for example i mean r d sits next to

42:06 product management but data science is right there cause data science needs both like you said the domain experts

42:13 and then some of the heavier r d muscle to really you know productize uh some of

42:18 those findings and things like that i wonder if marcel is still on the line uh

42:24 i'm still on the line yeah so that's that's uh

42:30 can you hear me yeah yes and now we can see you too i'm back away so

42:35 so i i would argue against that notion that data science is becoming the center

42:40 of gravity and because i think it's something similar happening um as we saw

42:46 with the evolution or the birth of the devops culture and sres when we had

42:53 developers and operational people like 20 years or 10 years ago and we had

42:58 those two camps when you threw your software over the wall to the ops people

43:03 and let them operate it nowadays we're looking for

43:08 devops people which are implemented by an sre which is

43:14 using tools of both worlds so the i think

43:20 the same should happen with the sre is now adopting ai and machine learning and

43:25 data science tooling and this this data science thinking to become a

43:31 data-driven sre person i think it's it's not just one group and one mindset of people um

43:38 orchestrating all the others and and talking to each other but it's it's

43:43 we we've we are we are we are embedded in data and we are uh the world is getting more and more complex so we need

43:50 to come up with tooling to understand and analyze that data and that's something that

43:56 each of us has to learn somehow and grow on on using these tools but otherwise

44:04 yeah we come back to the jobs question so that's that's actually a really good

44:11 uh point so so uh just so that the data science uh

44:16 isn't the privilege of a few and and to like kind of spread the message spread the tooling around the organization and

44:23 to like every developer and beyond uh how do how does one effectively uh

44:29 spread this kind of culture inside the organization uh like data

44:35 i mean the existence of data science in laggard organizations but also like like the benefits how how to utilize it

44:42 to to the you know best benefits there is actually a pretty good book about it

44:48 from the business perspective but but uh how do you see like what's the most effective uh way of building a culture

44:55 around utilization of machine learning and data science how do organizations get the most out of it

45:02 yeah i think there's a couple things to marcel's point earlier uh how to be able to consume it

45:09 it operators aren't developers and they're not data scientists and i don't think they want to be either one

45:15 right they want to be able to operate efficiently you need to have the tooling sets consumable for them in the right

45:20 way which means that they don't want to spend their lives learning how to do specific algorithms or having to prune

45:26 particular data sets they want to be able to have a set of tooling with the right set of interfaces for them to be

45:32 able to consume the data and get insights that's how you're going to get operators to adopt and be able to use

45:37 that i think today the space that that's being solved in is is a combination of open source and a combination of vendors

45:43 who are providing the right sets of of wrappers and consumption models because they understand their customers they

45:48 understand what they're looking for and they're trying to put glue those two sets of things together without necessarily coming up with proprietary

45:55 sets of of of tooling because i think that's the hard thing for many customers is is if

46:01 you go and talk with your data scientist teams they're probably adopting a set of frameworks from you know red hat or from

46:07 other entities that are trying to put together those open source tooling components and if you walk in as an

46:12 operator and say well i only know how to use x can you provide me the same interface that becomes a the big

46:18 challenge for the operator role versus what maybe your data scientist folks are doing internally so i think that's one

46:24 set of hurdles about how how folks can sort of you know understand where your operator

46:29 will understand what your boundaries are and yeah there's going to be some learn this is the job role change that we were talking about earlier you're going to

46:34 have to learn things that are outside of your core skill sets in order to be proficient but we had to do that all along anyway like anyone who was in

46:41 networking who didn't think you had to learn you know linux or learn windows or learn something else in order to be able to operate on the network

46:48 that's just a fallacy that's a lie you still had to do it you're still interfacing day to day on your mac or on

46:53 any other system to go admin those things so you have other additional skill sets this will be fit in that same

46:58 category where you're going to have to learn how to understand some base level sets of of you know

47:05 ai machine learning maybe some deep learning components that that fit in there at least to be able to have a good conversation with the teams around

47:11 what's going on and then also to be able to suss out what's important out of the data and information that you're being

47:17 provided out of the system right that's a behavior side am i actually getting something useful out of the system or

47:22 not and i think that's those those skill sets are going to be the ones that come onto your plate

47:28 because you're able to let go of the grind of the maybe the day-to-day operations change config management work

47:34 that hopefully the the automation is really sort of helping to to get off of your plate and not have to

47:40 deal with i don't know if that's a you know a good line of thought or not around that but that's sort of where my head is

47:47 at around it i think i think a on a on a company

47:52 level you have to adopt data as a product so it's not a

47:57 side effect that your company produces data and people have to use it to detect

48:03 bugs in your company but you are producing data and that means that you need

48:09 people and teams that have a product owner for the data that

48:16 you're producing and that can start with um things that add and add highlighted that you need to

48:24 go and talk to other people and understand their data and make it really easy for me as somebody not in that team

48:30 to access their data and not go through all the hoops there it must be effectively as easy as

48:37 launching up a browser and i can dive into the data there so as much as i can

48:43 launch google and end up on wikipedia and i have all the access to the internet we take that for granted nowadays but

48:50 even in red hat it's sometimes super super complicated to get the locks of some some certain system because uh they

48:56 are not treating data as a product so i think that's the first mindset on a company level that's uh you need to

49:03 do that transformation and that realization that data from each team is as important as an interface to that

49:11 team as and here's my change request but i also need to go to that team and say here's my here's my data request and

49:17 that must be fulfilled within minutes

49:23 yeah i built a little bit a little bit on both um from ed's perspective of like the

49:29 operator side of the world meet humans on the perceptual plane right

49:34 what i mean by that is improve the user experience overall and that bleeds a little bit into what marcel was highlighting as well which is like there

49:41 is something called a data user experience and what that can encompass depends on whatever data your system produces right

49:49 and you should have a data product manager that handles like that experience with that particular data set

49:55 but meeting humans on the perceptual plane is a socio-technical problem right it's not just social and it's not just

50:00 technical driving that mindset into your organization is going to take not only technology but

50:06 also culture shift right and figuring out what that means is going to be specific to the dna of the business

50:13 that you're working within um so that's not to say it's not impossible that's just to say that there are priors that

50:18 come with working inside of an organization and understanding those priors is important um and then lastly uh

50:26 again i'm a huge advocate of open source and open science data science and machine learning is empirically driven

50:31 what that means is i need to be able to prove to you that i did something that was useful um and then useful is based

50:37 on baselines or whatever it is that you're building that represents what means value to you what is your risk

50:42 function for your business right and then improving on those baselines um is important and being able to improve

50:49 and prove those results is like i said entirely empirically driven so open source and open science kind of just begets that

50:55 because you have to publish your your results in a way that other people are able to reproduce it right

51:00 so i think there's a lot of different angles that you could figure out how to take that with respect to building this practice inside of your organization but

51:06 it's important to make sure that it's not just a social and not just a technical problem

51:12 [Music] very good uh i know we're running a low on time but

51:18 before we go a couple more things one is a question from the audience a very interesting question just came from

51:25 walter does anyone think leadership in ai and ml uh like many things would come out of

51:32 darpa like it or not a lot of government standardization comes from u.s government defense

51:38 technologies what's what's the situation there anybody have any insight

51:46 anybody want to go i guess i can address part of it i i

51:53 think there's been a shift in in the um in how the evolution of the internet and

51:59 standards has has progressed over time and i think this is part of that evolution of of going from a centralized

52:05 you know darpa influencing you know open standards bodies the itf as an open collaboration identity um

52:12 and others and it's moving away towards much more of the democratization of it

52:18 through open source of innovation through open source and not necessarily through standardized formal bodies and i

52:23 think um i think this is an example where you may not necessarily see

52:29 an organization like darpa try to come out with uh sort of the standard sets around leadership around

52:36 ml necessarily in the technology space that may not be true from a governmental

52:41 perspective around compliance conformance regulatory stuff

52:47 that happens i think there's going to be a whole space and you know microsoft has made some very public statements about what marcel

52:53 was talking about earlier about you know face recognition facial recognition how that should be used what are the social

52:59 implications of that for data privacy and a bunch of other things and that's the whole different discussion that's

53:04 not even around i think aiopside but i think you aren't necessarily going to see the standards come out that way i

53:10 think the standards are going to happen based off of what at henry's point was earlier how open are the data sets how

53:16 how much is community willing to contribute to move uh the industry forward to have the right sets of data

53:21 models that are that can be produced and the right data sets that can be produced and shared in the right way to allow

53:27 everyone to sort of validate what's going on and to agree about what those standards need to be and i don't necessarily think that

53:33 you know i mean the defense industry can definitely contribute but i don't think they're going to be the only contributor

53:39 nor set the standard for what's happening in that area i could be completely wrong so it's my personal

53:44 opinion i put that out and i'm going to punch it to ed henry to just sort of reply back to that

53:49 [Laughter] darpa is an interesting beast they dabble in a lot of technologies

53:56 that aren't necessarily just defense related either um in network standardization yes some

54:01 things have come out of darpa with respect to maybe some of the security standards and interworkings with nist

54:07 and things along those lines but i would actually make the argument that the ietf has largely driven network

54:12 standardization for a very long time and that consists of both industry representation along with research

54:19 representation and government representation so i do think that some advances will come

54:24 out of that leadership is a relative term it's pretty clear right now that leadership exists in industry at least in the perceptual

54:31 space of machine learning computer vision natural language processing things along those lines but make no mistake there's probably darpa funds

54:36 behind some of those labs as well i don't have a direct answer to that i think just because of the sheer breadth

54:43 and depth at which darpa likes to explore enterprises and by enterprises i

54:49 mean people who want to submit grant applications that can then go and do some sort of research that darpa might

54:54 find valuable all right and i might find a question

55:01 yeah go exploration you know so instead of

55:06 mentioning darpa i think the linux foundation just launched two years ago an ai and data initiative

55:14 and they came out with the first gpl equivalent and licenses for data so

55:19 i think there's there's a lot of groundwork still to be done and the most traffic and velocity i'm seeing

55:27 there is in that domain so go to lfaidata.foundation

55:32 and you see something going on there and

55:38 we only have a few minutes left but but uh in a nutshell uh just for the aspiring uh

55:46 you know semi-data scientist in us like network engineers and networking managers uh

55:52 what would be your goal to place like if there was one resource of information where one one should go to uh to get

55:59 excited and learn more um about aiops what what would be your choice besides

56:05 mis.com of course and try out our stuff and we'll see where it's productized today uh but but if you want to think a

56:12 tinker uh more with the data science tools and things like that

56:20 not necessarily even tools but also books uh websites further reading any any kind of

56:26 further uh you know resources people could take from my perspective

56:33 any type of social media i would encourage the scientific community has taken uh to

56:39 twitter quite a bit uh if you're interested in the theoretical side of things but also the operational side of things having been on something like

56:46 twitter for almost a decade if not longer at this point there's a huge representation of

56:51 individuals that are working on a broad number of things generally and again this idea of open source open

56:58 science i haven't met a community yet um in the world of machine learning and data science that isn't trying to

57:03 embrace people who are just interested in the technology all the way through to doing fundamental research so i just

57:09 encourage anybody to pick your favorite social media platform find some work that you're interested in doing some

57:14 problems that you're interested in helping solve and i think just through that natural kind of exploration process you'll find a set of individuals that'll

57:21 help you kind of grow yourself very good any uh if you had to name a couple of uh twitter accounts to follow

57:27 what would be your favorites of course uh in addition to ed henry um well i tend to lean on kind of the

57:35 theoretical side of things um but i really that's on the spot man i think

57:41 i follow like 900 people on twitter right now um there are you know depending on what it is that you

57:47 uh are interested in you know what i could recommend find my twitter handle and just look at people that are following me or following or i'm

57:54 following or part of lists or things along those lines i don't have an answer for that off the top of my head

58:00 that's really really good uh i usually tell people the same thing if you want to know wi-fi geekery just go find me

58:06 and you know the people i follow what about marcel what do you think

58:12 if you have to point people to one resource so so i can i can drop a shameless flux

58:18 here um on our operate first initiative where we try to democratize and

58:23 come up with an open source equivalent to operations and building a completely transparent cloud setup

58:32 from workloads into the data center back end that's on operate

58:37 dash first dot cloud that's the url but in terms of

58:44 so i like reading books and the last book that i really that really struck me and was really inspirational was by jeff

58:51 hawkins like steve hawkins but jeff hawkins he's a neuroscientist and he wrote a book called on intelligence and

58:59 he has a really drawn painted a really nice picture of the brain which is just

59:05 consuming metrics in the end and

59:10 so that gave me hope that there will be some general artificial intelligence at some

59:16 point if our if if our brain can do it we should be able to repeat it with

59:21 machines that's that's a very well written book on intelligence

59:27 god it goes on my list for sure and mr horley

59:34 yeah probably the easiest thing is just if you're interested on the off side you're interested about the impact of ai

59:39 on ops just follow the hashtag for you know the hashtag aiops on twitter and you're gonna see a whole

59:45 stream of everyone who thinks they're applicable and then just go start pruning through for what you think is the best way so i agree with that henry

59:51 but most of the stuff that i keep track of i keep track of on twitter because it's the easiest way to consume it fast

59:56 i i do i do think red hat has some good information available up on their site too to sort of peruse

1:00:01 through and sort of get started uh if you don't understand anything about the operation side about what ansible can do

1:00:08 about what some of their product sets can do in terms of data ingestion and getting data insights if you never run

1:00:13 prometheus before go you know those sorts of things you need to you need some place to get started to go get

1:00:19 the software installed and play around with it so those are good guidelines to be able to go find that stuff and that's sort of and build a lab go tinker i mean

1:00:26 most of the stuff is available as as cloud resources and most of the major cloud platforms you can go get it built pretty quickly and and play around with

1:00:33 sample data sets that might be the other way to to sort of jump on and get started if that's something of interest

1:00:38 for yourself absolutely and and we have uh quite a bit of course

1:00:46 material on ai ups a ai and networking uh if you go to news.com webinars we for

1:00:52 example had a three-part ai ups uh webinar that we did with walmart and

1:00:57 some of the mark from walmart and some of the other guests uh you know concerning different matters of ai ups

1:01:04 um and uh in general nist.com webinars and on our website in general there's

1:01:10 there's a lot of stuff there but um and if you're if you're into networking wi-fi geekery uh follow my twitter

1:01:16 handle as well it's it's my first name last name altogether and uh if you guys don't mind if you can just put your

1:01:22 twitter handles also or or your linkedin pages what have you on on the chat

1:01:28 there i don't know if uh actually that's visible to everybody but

1:01:34 it's probably not anyway um you'll you'll find us on the on the signing page for example you can

1:01:41 find our our twitter handles and on twitter and find us uh any final words anybody before we sign out i know

1:01:48 we're a couple of minutes overdue already uh who wants to raise raise your hand and

1:01:53 give a final statement on where where are we today with ai ops and and where do we go from here

1:02:05 i'll i'll close that by saying it's it's it's not something that you guys should be scared of it's something that you guys should embrace uh uh this will make

1:02:11 your life easier over the long haul your job will actually get more interesting you get to work on more interesting problems and not necessarily have the

1:02:17 day-to-day grind stuff that goes on but it'll be a journey it's not overnight and uh you know

1:02:23 you know start doing your learning now so it's not it's it's not scary that's that's

1:02:29 what i would give us advice so i'm just watching the series um

1:02:35 season 3 of westworld so i can only give one advice be friendly to our ai

1:02:40 overlords

1:02:45 that already became clear in the first last episode of season one right

1:02:51 but with that i i guess it's time to to end the show thank you so much everybody

1:02:56 for joining thank you panelist marcel ed and ed uh really appreciate you guys coming over

1:03:04 taking your time and sharing your expertise i hope this was useful to uh

1:03:09 to the audience and uh i hope to see you again soon

1:03:14 thank you thank you take care everyone bye

1:03:22 cheers