Folge 310 - How AI Succeeds — Insights from Manufacturing Applications with Nikita Golovko

Episode (Video/Podcast) Summary Bullet Points Transcript

This text was generated using AI and might contain mistakes. Found a mistake? Edit at GitHub

Welcome everybody to another episode of Software-Architektur im Stream.

This time with Nikita.

So Nikita, can you first say a few words about yourself?

Yeah, hi, hello.

Thanks for the introduction.

Thanks for inviting me.

My name is Nikita, and currently I’m working as an AI portfolio architect.

I’m a part of Siemens.

And inside Siemens, our team and I are mainly responsible for bringing AI to the factory floor.

In the wide sense of this world.

So how to train models in the cloud, how to deliver this model to the factory floor.

And we just may be more interested in how to make those models run to inference in an edge environment with limited computational resources.

And yeah, it’s what I’m mainly doing.

And by the way, I have around 16 years of experience as an architect or maybe in whole software development sphere in different roles.

So we are rather interested in this topic.

And thanks again for inviting me.

Yeah, thank you so much for showing up and taking the time.

We had a conversation about this topic at some conference, and I thought it was quite interesting to listen to a practitioner who is actually doing that AI stuff for quite some while, while in the domain that I usually work in, which is enterprise IT.

It seems that we have yet to uncover all the problems and also the solutions.

And I think that’s, I hope, the worth that this episode will bring.

Short announcements.

So Nikita and also me, we will both present at the TechWriter Summit in Cologne, Hürth actually, one of the suburbs of Cologne.

And we have a rebate code.

So as a matter of fact, you can actually attend for free.

And yeah, that’s the subject here is related to what you’re going to present at TechWriter, but it’s not the very same content.

So there is a relation here.

And maybe it also whets your appetite to go to the TechWriter Summit.

I think it’s going to be a very interesting conference.

It’s the first time that I’m there.

And it’s the first time that they actually have a day about technical stuff.

They also have a day that is more about business stuff.

So it’s, I think, a good mixture of different subjects.

And I’m really looking forward to that.

Yeah, definitely.

Okay.

So let’s start with the sort of basics.

So everyone is talking about large language models, and it seems that AI is sort of a synonym for many for large language models.

So what are large language models actually?

Yeah, it’s from my perspective, it’s a big problem that everybody who is talking about AI mean under the AI LLMs and generative AI.

But from my perspective, LLMs, first of all, it’s a probabilistic engine, probabilistic mechanism, which is just taking input and providing output with some kind of probability based on probability distribution.

And the crucial fact that LLMs are not computing fact.

They are not generating a new knowledge.

They are not, let’s say, they just use this statistics and the probability, and based on this, we are guessing and just trying to interpolate as any AI model patterns it saw before.

And, yeah, we should be, not even as architects, but as engineers, should be rather careful with terminology we are using.

And from my perspective, we even should try to, I don’t like this word, to educate, but try to increase awareness of, I don’t know, stakeholders of other engineers around us about different types of AI and when each of these should be used and can be used.

So, sir, Gray just said they are synthetic text generators that are sometimes useful.

And I was just reminded about an episode that I did that discussed a scientific paper, like really proper science, that discussed how LLMs are actually bullshit in the sense that they provide text without any idea whether the text is actually true, whether the facts are true.

So, I think that nicely sums it up.

Yeah, I also somewhere saw this analogy that talking to LLM, you can imagine that you talk to some kind of colleague or person who read every book which exists and which just immediately answer your question.

And this answer, it’s not mandatory, will contain relevant information or useful information to you or maybe even the answer to your question.

And it’s also, I think it’s a great point that you point out that it’s only the written book because the experience that we have goes far beyond that.

And I think that’s important to note.

It’s interesting from this perspective what is written and the experience we have in our heads.

I also have an interesting example about COBOL.

A few months ago, there was a message or news from Entropiq saying, hey, guys, you don’t need these COBOL developers to make immigration because we created the super code COBOL coding agent and everything should be done.

And later I saw another article which we’re referring to this one and one of, let’s say, old COBOL developers wrote, hey, guys, the problem is that the majority of COBOL code is not publicly available.

This is in somebody’s head or closed depository.

That’s why you cannot judge.

You cannot promise.

It will work for demo, but for the rest, it’s here as a question.

So, Eric Ray on YouTube just said, I like the Merrill analogy.

Yeah.

So, but, I mean, the reason why I really wanted to do this episode is because of your experience with industrial applications in the industrial context.

So, can you explain why LMs fail in industrial applications and what kinds of applications that actually would be?

So, first of all, industrial application, they based on the main assumption that everything should be deterministic.

And the core of, let’s say, each industrial automation, industrial application, it’s so-called programmable logical controller, which at the end of the day gives you a concrete answer whether you need to stop your line or not, and it cannot say you probably need to stop.

And the case of LLMs, they cannot make this guarantee of determinism because they are probabilistic by nature.

More than over, their results are not reproducible, and it’s well-known fact that the same input can give different outputs, and these outputs are not, let’s say, transparent.

They are not understandable and reproducible.

Of course, there is a, let’s say, reasoning mode in each LLM, but it’s not the real reasoning.

It’s just, I don’t know, tracking or logging what kind of sources LLMs used to came to this decision.

And another interesting fact, it’s named as a confidence illusion.

For instance, if model, say, use it with a confidence 94% this is a defect or something else, it does not really mean that this case should be, this answer is correct because probability, it’s just confidence, it’s just a level of probability model belief in this output and model just trying to make a pattern matching, nothing else.

And also what is important in any use case where you’re using AI, so-called drift or maybe even silent drift, for instance, environment, user behavior change, economic conditions change, but model stays the same, training data becomes outdated, and in this case, your model will still provide some kind of outputs.

You will not see no exceptions, no alerts, no messages notification to the human being that something was breaking, and which from first perspective looks good, but at the end of the day, it’s totally corrupt, totally unworkable scheme under the hood.

From my perspective, it’s, let’s say, of course, there are a lot of different problems why we cannot use LLMs in industrial use cases, for instance, because of their performance, because of their size, but it’s a technical limitation, but I’m mainly talking about, let’s say, ideological or conceptual limitation why it’s not a good place.

So, first of all, as Grace said, they don’t even give you real probabilities as opposed to regular ML machine learning over distribution.

So, that’s, I think, what you were talking about concerning confidence.

What I’m wondering is, I mean, I can totally see.

So, I’m a layman concerning industrial applications, obviously.

So, I can totally see how you can have some kind of, what is it, like a system that says, okay, I take a picture of that screw, and I figure out whether that screw is a good one or a bad one.

So, I see an application for these kinds of, you could call them AI systems.

However, LLMs are, as we said, text-based.

So, can you give an example where you actually have tried to apply them, and where you failed in the other questions?

Or is there, what is some potential theoretical application if you didn’t really try that in practice?

Here, I would propose to have a split between, again, between LLMs and generative AI in general, because, for instance, generative AI, it’s a big family of models which provide the new content.

It can be images, it can be text.

LLMs, it’s a subset of the generative AI.

Considering LLMs, one example, for instance, we have a system with a combination of edge computing and the cloud computing.

For instance, each part of the system is responsible for a defect detection.

It’s a computer vision model with a computer vision AI model under the hood, which classifies a type of defects on the surface of an end product, where these defects are placed, situated, just as a simple, not simple, but to some extent, classical machine learning AI system, which, at the end of the day, provides a machine-readable output with a type of defect, with a position of defect, and this information goes to the cloud, where, for instance, Rack together with LLM sits, and based on this information and based on documents we have inside Rack database, this LLM provides some kind of reasoning or root cause, analyzes why one or another defect happens.

It’s one example.

Or another example, considering the general application of generative AI, since all factories and factory automation, the main idea of factory automation on industrial automation is to prevent any kind of failures.

Sometimes it’s rather difficult to have enough examples of failure modes.

That’s why generative AI definitely can be used to generate synthetic data representing these potential failure modes, not, let’s say, not affecting the real, I don’t know, aggregates.

It’s another one of examples how generative AI can be used.

Okay, so with regards to the root cause analysis, it’s actually what it sounds to me as if you’re trying to build a system that actually sort of thinks about or tries to figure out the root cause.

Did you actually try that and failed?

Or is it rather that you said, well, this is not text generation, this is something different?

Because at the end of the day, it’s about thinking about root causes.

So did you actually try that and fail?

Or did you rule out that option from the beginning?

It can be definitely an option.

I tried it, it works.

And in this case, it’s definitely, yes, it sounds like a reflection, a root cause analysis, but at the end of the day, it’s text generation, because when a downstream system sends to other one some kind of description of the defect, what RAC system or LLM system in the cloud is doing is just going to the database, get relative documents, and try to summarize and try to say, hey, domain user, you have a defect because you need to change the, I don’t know, conditions in zone 1, 2, 3, 4, 5.

Again, it’s not generating new knowledge, it’s not reflecting.

This is just trying to summarize and try to provide a summary based on documents which already exist.

So it seems like a workable solution.

Yeah.

I’m at the moment currently working on, let’s say, prototyping this solution, and maybe, I don’t know, I will share this one in my repository.

But from my perspective, it can be a good illustration, not even for industrial automation application, but in general, how to have a clear split between different types of AI and to show that AI, it’s not even LLMs, not only LLMs and not only generative AI.

There is different types, and how these types can work is a part of architecture and can interact with each other.

Okay.

But what you’re saying is, so what you say, does that mean that you should be careful with LLMs and how they will elucidate and how they are probabilistic?

But that doesn’t rule out that you could use that, and, you know, if you have some root cause analysis, and the root cause analysis, you have to be aware of the fact that this might actually be wrong, but that doesn’t necessarily stops you from actually doing that.

I think that probably sums it up.

Definitely, definitely.

I’m not saying that you don’t need to do – you must not use LLMs.

So the main message definitely is, as an architect, you need to find a better solution for your system, but first of all, you should be aware of this hallucination, and as architects, the main question you need to answer, or you have to answer, is what your system should do in case your AI part is not convenient, or your AI part provides some kind of hallucination, or if it does not work.

So you would need to ask the question, what happens if the root cause analysis is wrong, and then you work from there?

Yeah, and from this perspective, maybe trying to map this stuff to known architecture practices, we can even name, or we can even mention that AI is one of good practices.

AI can work in a separate bounded context with respective anti-corruption layer, and with all this respective.

Of course, deep dive in these details, but with some kind of isolation and some kind of layer which will protect your domain from this probabilism, and this, let’s say, bad side of AI stuff.

And it reminds me of the discussion that we had, I think, like 14 days ago with your colleague Michael Stahl, Professor Michael Stahl.

And he was also working at Siemens, and we were talking about the architecture analysis tool that he built, and he also mentioned that you have to be aware of that whatever that system spits out concerning the architecture, it analyzed that you have to be aware of the hallucinations, and that you must sort of double check it, and these kinds of things.

It seems to be the same, sort of the same thing in a different area of application for LLMs.

Definitely, because at the end of the day, AI is just, we can think about all AI stuff just as an additional block with inputs and outputs with some kind of peculiarities which we should consider with building contracts and interfaces around this specific block.

Okay.

So, and you already mentioned generative AI, Gen-AI.

So, how are LLMs different from Gen-AI, and how do you use them in industrial applications?

I already said that LLM is just a subset of generative AI.

Generative AI is mainly responsible for generating new content, not a new knowledge, and from an industrial perspective, generative AI can be definitely used for generating new data, synthetic data, LLMs for reasoning, LLMs for helping end users to make a decision to analyze a big amount of data.

But the crucial point is, as I already said, that LLMs part and all this, let’s say, most probabilistic part of your system should be deleted from the control loop where you’re affecting the physical reality around human beings or your production.

Okay.

So, what other types of AI are there apart from generative AI?

So, we are stepping up the different terms.

Yeah, yeah.

So, maybe the most important is so-called classical AI models, which can be illustrated with decision trees, random forests.

So, the main advantage of these classical models is they are interpretable.

For instance, in the case of decision tree, you can, at the end of the day, of course it’s a simplification, but at the end of the day, it’s some kind of tree which shows a lot of rules, a lot of conditions, and what will happen in case it’s a big if-else tree.

And it can be reversed, and you can see how your model make a decision in which case it has this, let’s say, this separation.

It’s one part of, it’s a classical.

The second big pillar, it’s a computer vision model, which is also not a generative AI.

These systems or these models are made responsible for analyzing images.

It’s object detection, classification, segmentation, all this stuff connected with visual representation.

Rule base, it’s more strict systems.

Maybe they are not even, we cannot even name them as a part of AI, but AI is much wider than ML.

It has some kind of, I don’t know, genetic algorithms and so on, so on, so on.

But the most popular is classical models and computer vision models.

And computer vision models, I think we already discussed them, and that was also part of our original discussion at that conference, where we were talking about how you would find defects in products, like your system would look at it and figure that out.

Can you give an example of a decision tree?

For instance, decision tree, good example, it’s a system, bank system, which is responsible for fraud analysis or for giving loans.

So there is a number of rules which are used in a bank based on which bank or finance organization can make a decision whether it makes sense to give a loan or not.

And this can be, the big amount of these rules can be implemented, can be absorbed by decision tree.

And the main advantage, why I’m giving example of a bank, because for instance, in the case of bank systems, in the case of financial systems, this affects human beings.

And we have a GDPR and according to the article, I suppose, 22 of GDPR, each person who is affected by AI should be able to get a clear explanation how AI interprets him and why AI makes one or another decision.

That’s why such kind of classical models are crucial and widely used for banks and for this organization, which are under some kind of regulations.

Now, OK, so that makes a lot of sense.

Now, I would argue that, you know, when you apply for a loan, as you gave as an example, and there are some business rules to either give you the loan or not give you the loan, I could argue that this is actually business logic.

And you know, I could just write it down in Java or whatever language I prefer.

So where is the line between calling this business logic and calling this an application of a decision tree or a rule-based system and considered part of an AI?

Definitely, it’s a number of rules you need to absorb, you need to implement as a part of your logic.

And when you are using a classical deterministic logic, you need to find the barrier.

So you need to find if this, I don’t know, value five, then make this decision.

In case of AI, you have a capability to retrain your model and you don’t need to care the concrete value.

For instance, you have an initial data set, which represents, I don’t know, some kind of distribution on top of this data set.

You can train the decision tree, which will, at the end of the day, give you some kind of list of, again, it’s a simplification, but this tree can give you a list of if-else which can be implemented if you really need it as a list of your business logic.

But next time when you have a new distribution or when, for instance, you have some kind of shift, which we saw in COVID, after COVID, we have a totally different landscape of users and so on and so on, you need to reimplement this logic.

But for ML use cases, you just need to collect enough data and just retrain your model and have the same transparent if-else logic, but which automatically absorbs all variation.

OK, so what you’re saying is that instead of writing the logic, actually, I train the system and then the result is logic and I don’t need to look at the logic itself.

My assumption would be that if there are gray areas, like if it’s not really clear cut and I need to make a decision that is more intuitive, I’m not sure whether I can come up with an example.

I guess we all know, we look at some diagram about some architecture and we are like, hmm, this is somehow weird, but we can’t really figure out why.

So I assume that this would be something that these gray areas are more intuitive things.

This is probably something where this approach excels or is it not?

So what’s your take on that?

Yes, definitely.

Another example, for instance, if you have some kind of example, which in the intersection of your logic, AI approaches, these classical machine learning models can help you to say this with a high probability, these concrete examples, these concrete, I don’t know, instance belong to class A, class B, class C, because and constructs the logic rather transparent and business oriented, because of what?

But this is applicable only in case of classical models like decisions, random forest, but in case of LLMs, of course, you can also make this decision process transparent, but there is no guarantees that this decision process, this reasoning process will be logical, because you can sooner or later face a problem that it depends on temperature, I don’t know, temperature somewhere, because of statistics.

Yeah.

So, as Greg just said, another use case, financial forecasting.

Yeah, definitely.

Definitely.

It’s about absorbing the patterns, because sometimes for human beings, it’s difficult to find the respective patterns, the respective, let’s say, trend in time series data.

And from the perspective of classical machine learning models, it’s one of good examples it can identify this pattern and try to extrapolate this one to a bigger, bigger range.

Yeah.

I’m not sure, because that thought just crossed my mind, and I’m not sure whether you have a good answer to that, but I understand that if you look at x-ray images, systems are supposed to be better than humans, but somehow I heard that in real life, humans are still preferable for one reason or another.

Do you have any idea about this or how this works in practice?

Because I could imagine that, you know, and maybe that’s, and it’s also something that applies to your domain, because if I look at something as a human, that’s different from what the system does.

So, what’s your impression there?

Is human still superior or is the AI system superior?

Also in practice, would you even have people that still look at the manufactured items to figure out whether they are defect or not defect, or is it all automated?

From my perspective, definitely we need a human in the loop.

For instance, in these medical use cases, when we talk about x-rays, yes, computer vision models can analyze these x-rays better than a human being, but again, all AI, they are tightly coupled to a pattern.

For instance, AI will see, I don’t know, a dark part of this x-ray and they say, probably this is a cancer, but a human being should give a feedback, the human or doctor who is operating this system should use another approach and other tools to identify whether it’s a problem or not.

That’s why definitely we need a human in the loop, on the loop, and to provide the feedback.

And you also do that in industrial applications, so I assume that, you know, you have mass production and then there is defect detection and the system says, okay, this is something that is defect, and then you still have a human look at it, like at everything?

Yeah, definitely.

For instance, a good example is so-called simplex architecture, when we have three blocks, one block is an AI model or AI part, the second one is some kind of deterministic business logic or old-school logic, which is based on if-else, and there is a monitor, which first of all monitors the behavior of the model, the probabilities, the confidence, the output of the model, and also, for instance, this model is monitoring inputs, and in case monitor will see that distribution of inputs is something new, which model doesn’t suspect to see, or model behaves in a not convenient way, then it will switch to a deterministic part, or it will switch to a human being to involve, I don’t know, reasoning, and just, yeah.

So, as Greg just said, my guess is, as someone who actually did research in that area, finding things is easy, creating a diagnosis is harder?

Yeah, for instance, diagnosis is, in case of giving a diagnosis, usually diagnosis are not only given on an X-ray, only on one, some kind of research, you need to consider the whole illness history, the whole, I don’t know, even behavior of the patient, which is out of the scope, out of the sense of sensing of concrete model.

Yeah, and I had to smile when you were talking about how things that are not in the training data are actually a problem, because there is this video on YouTube where there’s a Tesla driving down the street, and, you know, there is blue lights all over the place, and it’s quite apparent that there is police, or the fire brigades, or whatever, and the car just won’t stop and won’t slow down, and it seems the explanation is that this happens so seldomly that it’s not really in the training data, and then the car would just happily drive on with high speed, and obviously that leads to an accident, so that’s what came to my mind.

So, for instance, it’s the same illustration about these corner cases, for instance, all these Tesla models, and initial versions of these models were trained in a laboratory in some kind of, I don’t know, sanity conditions, and it has never even seen these corner cases, and generative AI can be used to generate these corner cases, these unusual cases of data, just to challenge your model to, I don’t know.

Yeah, I mean, so I have to tell that anecdote, so when I was at university, there was that one person who reported about a system that was supposed to detect tanks, like Panzers, and they said that it was quite successful, until they provided the system with real data, and it failed, like, big time, so the system was supposed to find out whether there was a tank on that picture, and it was very successful, until they tried to use it in real life, and then they figured out what they really had was a system that could figure out whether the weather was sunny or overcast, because all the pictures with the tanks were taken on a sunny day, and the ones without were not taken on a sunny day, so there you go.

It’s also a good example, which we usually see in our industrial use cases, when a model, in these computer vision models, when a model was trained in a laboratory with one lighting condition, and in a factory floor, the conditions are totally different, and more than ever, these conditions are changing during the day, and as a part of mitigation, you need, as an architect, provide another abstraction, another mitigation block, which will be aware of this lighting condition, and try to, I don’t know, increase light, and decrease light, and it’s about this monitoring block, part of this monitoring block, which can be also responsible for validating, for checking the distribution of inputs, and so on.

Yeah, so, sorry, Grace said, Facebook person recognition used to identify person, by sweatshirts or t-shirts.

So, what’s the impact of that probabilistic nature that we have in, it seems, all of AI, on the integration into the architecture?

Definitely, we need to mitigate this probabilistic, we need to find some, we need to build some kind of bridge between probabilistic and deterministic, and a good example, a good how we can do it, first of all, we should use some kind of gateways, as I already said, about this simplex architecture with model, which is responsible for monitoring AI, for instance, one approach, you need to use an AI gateway, or let’s name it just a gateway, which will check the confidence scores, the probabilities, the parameters of a model, and in case if the model is not confident, throw this to a more deterministic contour, so to make, it’s just like to use, like we are using the classical API gateways, but instead of HTTP headers, and I don’t know, the content of the packet, we’re just using the metadata provided by a model to decide to which roles this decision, to which roles this, I don’t know, flow should go, it’s the first.

The second, usually models are not failing on one case, it’s failing on, I don’t know, 30 cases, and in this case, we can use, for instance, again, as a classical example, circuit break, so you see that your model is failing, you need to, I don’t know, to switch to another road, and from time to time, try to check whether it’s, again, healthy or not, so from the perspective of circuit breaker, it’s also part of this, maybe, gateway, and of course, governance, everything you made, every decision, every, I don’t know, outcome, output your model provide, should be locked, should be documented, not only from the perspective what model says, but all metadata, what kind of inputs, what kind of outputs and versions, and it can be also a good evidence for your auditors for this trying to fulfill requirements of UIAA Act, AIUA Act.

Yeah.

So, what you’re saying is that you need to have a fallback mechanism that won’t use AI until to go to that one, and circuit breaker, we should mention that probably it’s with this almost famous pattern, probably from the resilience space where, and circuit breakers, the device in an electrical circuit that would break the circuit to make sure that the house won’t burn down because there’s a short circuit somewhere, and then, you know, the lights go out, power goes out, and then, but, you know, the house won’t be burned down because you take that, you just stop the current from flowing.

And now, this is something that you would also use for resilience in systems because you take one part of the system down so that you make it more, well, resilient, and it doesn’t crash.

Instead, it has some time to recover these kinds of things.

Yeah, so, and yes, good news that considering these patterns, we see that we’re using, we should use the same patterns, the same approaches we’re already using in our classical software design, but with application to AI use cases, and this AI use case doesn’t differ so much.

It’s just another block.

Of course, it’s, again, a simplification, but it’s another block with peculiarities which can be wrapped with the same pattern, with the same resilience practices, with the same, I don’t know, routing practices, high load balancing, and so on, so on.

Yeah, I find it, I think it’s quite interesting that you’re referring to the very same patterns.

One thing that I’m wondering about is, so if you use these circuit breaker, for example, for resilience, it’s quite obvious that the other system failed because, I mean, there is a measure that says, you know, it returns an HTTP 500, so obviously it’s failed, or it won’t respond at all, so obviously it failed.

However, with AI systems, you are referring to that confidence score, and I can see how that somehow gets calculated by the model.

However, that’s just a probability.

So, is that enough in your experience?

So, what I’m trying to say is, you’re trying to figure out whether the system fails.

You’re trying to figure out whether the system fails by asking the system itself.

Now, there might be ways the system fails, and it wouldn’t say it fails, it’s just highly confident, it’s just complete utter nonsense that it tries to do.

You just need to monitor, you just need to check inputs.

So, you are listening to the AI system itself, what kind of output the system gives, and also you need to analyze what kind of data is used as an input for your AI system.

And, for instance, as part of this analysis, you can see that this data is nonsense, or this data, you know that your AI model has not seen before, that it totally makes sense to switch to another, to deterministic, to the human in the loop, just to make a decision.

Because AI, at the end of the day, will just give, as I already said, will just give a probability, even if this model has not seen this pattern before, or this data before, it will try to generalize and provide, let’s say, from what I’ve seen before, it looks like this one.

But it’s, yeah, that’s why you need to monitor data you’re using as an input also.

Yeah, it’s just that I’m still sort of trying to wrap my head around that, because, and I have to think about that one example where there was, I think it was an LLM system that was supposed to run a vending machine, or it was even supposed to run a shop, I’m not entirely sure.

And at the end of the day, that machine was, or that LLM was talked into providing stuff for free and would run the whole business into the ground.

Now, what you’re saying is, or I think that’s what you’re saying, this is sort of, you shouldn’t do that.

You should have some checks and balances in place that says, okay, this is a decision that it shouldn’t make.

And you should have some sort of supervision around that to provide these kinds of things.

So then you would need to, well, implement something that says, okay, you’re not going to give out things for free, sort of set of rules to check that.

And I’m just wondering why.

They didn’t think about that because it seems sort of obvious.

I suppose they were mainly, again, it’s my guess, they were mainly driven with this AI hype.

Let’s use LLM, it will solve all our problems.

First of all, even from the architecture perspective, it’s a crazy idea.

It’s a very bad idea to put such kind of generic model as your decision engine because at the end of the day, you don’t really know how this decision mechanism works inside this system.

And even, for instance, from a business perspective, your stakeholders would like to use this crazy idea with LLM as a decision engine.

You should, as an architect, try to mitigate consequences.

For instance, as you give an example, you could create some kind of checksum of behavior, of recommendation, or maybe use some kind of classical rule engine, whether the decision makes sense or not.

First of all, my recommendation would be to try to affect their decision not to use LLM as such kind of foundation, generic model.

That was also the reason why I asked whether you’re also considering a human-in-the-loop for failure detection because it’s, or should I put it, if there is a failure that is detected and, in fact, that part is actually okay, I think it’s not such a huge problem.

But if you already do these kinds of checks in these kinds of environments, you should consider doing them at the other example that we gave.

You should really, really consider that.

So, a grader said the case was from Anthropic, so they tried it completely free and failed.

So, thanks for that information.

Part of their marketing, I suppose.

Yeah, probably.

And then he also said, yeah, this, since it’s basically statistics, ensure that the statistic probabilities assumptions are still valid for classical ML.

Yes, definitely.

So, yeah, you can even evaluate whether one or another behavior is statistically valid, or it just lives in the same statistical range, how it should work usually.

Yeah, yeah.

So, again, as an architect, you should, if you cannot change things, if you cannot change decisions your stakeholders make, you should try to mitigate consequences.

You can try to build a safe environment.

You should, first of all, isolate this from the rest of the system, and then try to use, I don’t know, interfaces and mitigation measures.

So, talking about it, maybe that’s a stupid question, but is there such a thing as a confidence level?

Also, if you use an NLM, because I’ve never seen that.

I mean, if I use chat GPT, it says, this is the answer, deal with it.

And it said, well, but my confidence score is, I don’t know, 40% or whatever.

There is metrics, as far as I remember, named perplexity, and this metric shows, I can be wrong, but this metric shows the probability of, so there is a list of what should go next, according to this text, and perplexity is a probability whether NLM will give a definite word as its output, something like this.

But, yeah, it names another, but logic, it’s again about confidence, probability.

It’s a probabilistic, not even infinity, probabilistic.

Yeah, because I’m wondering, if you rely on that metric, then you need to have that metric before you can start building that.

The problem is that, yes, yeah, you’re right, but this metric can be, how I can calculate this metrics for a text?

I can only calculate this metrics, for instance, if I take all existing books and just calculate a probability that one word will go after another and use this probability to measure.

But this probability brings no value for my business, for my concrete business domain, because, yeah.

So, sorry, Greg just said, nope, they don’t really have it.

That’s a huge problem, at least from the API surface.

And he also said, I’m not sure what he is actually referring to.

Your models have solved exactly this case, by the way, so I assume that he is still talking about that vending machine business that is run by NLM.

So, you came up with an extension to Arc42, that famous architecture documentation system standard for AI, and that is also something that you’re going to talk about at the TechWriters event.

So, can you say a few words about that, what it actually is, why you would use it, these kinds of things?

Yeah, definitely.

So, Arc42 is a super cool and agnostic framework to document classical architecture.

The problem is nowadays we have a lot of systems which has AI under the hood, and the problem is that Arc42 does not cover this AI stuff, so description of how definitely we can share.

I have a separate repository.

We can share this template and full description of this extension, but my idea to propose an extension to Arc42 to have the same, let’s say, standard and classical well-documented approach how AI architecture should be described, because nowadays definitely we have different, let’s say, instruments, approaches, how, I don’t know, we can document model cards and so on, so on, so on, but the idea is to collect, not even to collect, to provide a solid approach, how we should document in order for architects to understand each other, in order to be on the same page, in order to have the same approach.

It’s the first.

And the second one, this extension is mainly driven by necessity to fulfill requirements of AI-UA Act, because, as I already said, once you have a model, you need to make this model transparent.

It’s a process how it makes decisions, how you use, who delivers.

It’s, again, about making the transparency, bringing the transparency about how models behave, how its models work on inference time.

It’s very short, but I suppose if we can share my repository, listeners can find more details.

I will definitely put it in the show notes, and also I will put it in the chat right away.

So, can you give me an idea what kind of additional artifacts there are in that Arc42 extension?

Like, is there another chapter, or are there other types of diagrams, or what does it contain?

What would these things actually explain, or is it a completely new approach, where you have completely different chapters?

The main approach is to extend existing chapters, so as a part of this repository, listeners can find a clear description of new ideas.

So, from the perspective of this AI extension, I’m mainly talking about four main views.

The first view talks about data, so we need to document from which sources we take data, how we take this data, how this data was transformed.

The second view is about model behavior, so we need to make model behavior logic, how models reason in transparency.

The third view is about how models behave in the runtime, how we deploy, what should we do, how models retrain, how models collect feedback, how it’s deployed, what kind of CI-CD pipelines it contains to be delivered.

And the third one is about risk.

So, maybe it’s the biggest extension, because we already have risks in Arc42, but this extension about risk explicitly lists new risks which AI brings to your architecture.

It explicitly asks you to mention who is owner of this risk, how you are planning to mitigate this risk, and this definitely will be a very good help for everybody who is planning or who has to fulfill these requirements of AI Act, which we have in 2026.

And at the end of the day, these five views are split.

Each of you contains subsections or maybe some kind of subtopics, and at the end of the day, these five views are distributed as an extension to each chapter.

So, at the end of the day, it’s the same structure, the same 12 chapters, but they are extended with new tables, with new, I don’t know, artifacts.

It’s not really new artifacts, because artifacts are the same.

Artifacts are the same, maybe a new description, new representation, or new utilization.

Yeah, so thanks a lot.

That’s very helpful.

So, you were talking about how you would document the model and what you did to the model, how you trained the model.

Now, with the most common things that are done these days, the model would be an LLM that is provided by Anthropic or OpenAI or whatever.

At least that’s my assumption.

So, how is it useful if I have such an application where I’m using that provided model, that LLM that is already given to me?

Again, the first question will be, as an architect, do you really need to use LLM as your AI model?

If yes, then my recommendation would be to try to use a local one, because nowadays you have a rather major infrastructure to deploy your LLMs, open source LLMs, on your local contour.

For instance, even if you are a fan of white coding, you can create a local infrastructure with a clean model, which can help your developers to, I don’t know, to white code locally.

So, first, have a local LLM, try to solve your problem.

And, in case if you really need, let’s say, deep, let’s name it reasoning, or deep capacities, you can use some kind of road gateway on top of your infrastructure and load this request to a bigger model hosted by Anthropic, I don’t know, OpenAI, and so on.

So, let’s say, I’m against because of costs, because of, let’s say, necessity, because of security.

So, if you really believe that LLM is what you need, and even from the perspective of prototyping, you can just deploy it locally, use, I don’t know, Lama and all this, I don’t know, LongChain.

So, we have rather wide and, as I said, major infrastructure, environmental sets to work with these LLMs locally.

Okay.

And, you also say that there is an impact on AI on hexagonal architecture.

So, what is hexagonal architecture?

What’s the impact of AI on it?

Yeah, definitely.

As I already said, the good idea is to isolate, and maybe the main requirement is to isolate AI from the rest of the system.

And, in this case, we can use hexagonal architecture.

And, funny fact that you can consider your AI and your domain as separate hexagons, interacting with each other through the respective ports.

And, in case of AI, as a center of hexagonal architecture, you will have a port.

Each port will be responsible for a respective input or output of your, not even input and output of your model, but input and output of model as an artifact.

For instance, input can be, if we are talking about computer vision model, it can be, a port can be image input, and adapters can be different cameras or file storage as images input.

As an output, you can use a port, which will be responsible for the concrete engine responsible for inferencing this model.

So, you have an agnostic artifact, for instance, ONNX model, which can be inferenced on 10s or RTs through the respective adapter on the ONNX and so on and so on, which gives you, at the end of the day, the replaceability of your model when you need to retrain your model, when you need to change your model, for instance, you work with a classical AI and suddenly decide to replace it with LLM, you just replace the artifact, but the ports and the specification and contracts stay the same, so we try to isolate.

And the same about the classical approach, so you use your domain with the ports and you have ports to communicate with domain and with the model.

It’s mainly the idea, I would say, propagating or trying to talk about how hexagonal architecture can be used.

And, again, hexagonal architecture is something we already know for, I don’t know, for decades, and just use it for nothing.

Yes, apply the same ideas for different things.

So, that’s great, and it shows how the fundamentals don’t actually change.

So, here is a question by Sir Gray.

So, he asks, how do you deal with model versions from vendors, which basically change monthly?

And I should add that, actually, in our stream, Ralf did the transcription, and he was using some LLM to do that, and also the sum up and so on.

And then, eventually, the model that he used was outdated and wouldn’t be available anymore.

So, we got a new one, and the new one behaved quite differently.

So, therefore, the sum ups and the bullet points that the system concludes from the episodes were different.

I think that’s the problem that Sir Gray is referring to.

So, eventually, you have a new system, a new version of an LLM that behaves very different for the same input, and then you have a problem.

And that is something that we also have in architecture, or in software development in general, where usually we try to pin down our dependencies to the last dot, to have precisely that one, to have precisely rebuildable builds.

And, obviously, if there is some LLM, that won’t work.

So, I guess that’s the question.

Yeah.

The answer is just to use a hexagonal approach.

So, you should instantiate the model artifact from interfaces, from ports, for instance, considering different versions.

In classical, or more or less classical, use cases of AI and ML, it can be modeled, deployed, or provided by a vendor in an ONNX format.

It’s some kind of standard format.

It’s a file.

You can use different ONNX models from different vendors, and what you need to do at the end of the day is just to replace one file with another file, but the rest of the ports and adapters will stay the same.

So, you have a port, how to run this model, and adapter to run this model, either on one environment, hardware-dependent, or another.

In the case of LLM, it can be the same.

So, the general recommendation is to instantiate, to build an abstraction layer on top of your model.

It does not even mean that you can start with just these abstractions, and sooner or later you’ll come for a hexagonal approach.

Just an abstraction, which just hides technical implementation, and contracts should stay unchangeable.

So, you just change your model.

Which basically means, or that’s the question, so if I use some LLM that is out there on the internet that I don’t really control, and that might eventually not be available anymore, I’m basically screwed, and there is no way around it, so I should rather not do that.

Is that your advice?

So, is it again an advice to use local LLMs instead?

You just need to use local LLMs.

In general, you need to build an interface layer on top of your LLM usage, because LLMs, they have a specific, for instance, not even LLMs, any kind of system has specific requirements for input.

You just need to build an abstraction, which abstracts your business language, your business inputs from the inputs from the schema your concrete model needs.

And when you change one model with another one, you just need, you have the same port, but you need to change the adapter.

Okay.

Anything else that you want to mention?

Anything that I forgot to ask you?

Yeah, maybe the main recommendation from my side, as I already said, there is no, let’s say, something new, some kind of new approaches or new recommendations, and industrial AI does not really have specific problems.

It has a, let’s say, low tolerance for the problems all AI systems already have, and all these extreme requirements with, I don’t know, reliability, interpretability, from my perspective, it’s a good engineering practice for any kind of AI system.

But in case of industrial AI, you have to implement this from day one.

But in case of classical, more or less classical web application and enterprise application, you still have a time when it happens.

That’s why it’s, yeah, approaches are the same.

But, yeah, it can be a good illustration to use this engineering practice from the first day.

Okay.

So, there is one, I mean, I was proud to end the stream, however, there’s still one question by Hiral Dave.

And the question is, can we say that dependency injection rule will play a pivotal role going forward for such situations where models change monthly?

Yes.

As one of implementation, yeah, dependency injection can be, yeah.

Because in dependency injection, it’s, yeah, one of the implementation patterns which can be used as, let’s say, behind this hexagonal architecture, yeah.

Definitely.

So, where you would then inject the model or the interface to the model, and then whatever uses that model would use the one that is injected, and it won’t be looked up by the system itself.

Yeah.

Definitely.

Okay.

Thanks a lot.

Thanks a lot for taking the time.

Talk to you soon at TechRiders.

I think we will meet in person there.

And have a great weekend.

Yeah, thanks a lot.

Yeah, thank you.

Ciao.

Bye.

Folge 310 - How AI Succeeds — Insights from Manufacturing Applications with Nikita Golovko | Transcript

Software Architektur im Stream