Episode 291 - Diagrams as Code with AI with Jacqui Read

Episode (Video/Podcast) Summary Bullet Points Transcript

This text was generated using AI and might contain mistakes. Found a mistake? Edit at GitHub

Unfortunately, Lisa couldn’t make it, so we replaced her with myself and Ralf, because it takes two of us to replace her.

We are at Software-Architecture Gathering, and first of all, a big thank you to the organizers of Software-Architecture Gathering for the support.

So this has to be the most professional episode we ever did.

And our guest today is Jackie, and you have already been a guest at our stream several times, I believe.

So, can you do a short introduction of yourself?

I can.

So, I’m Jackie Reed.

I’ve written the book Communication Patterns with O’Reilly, and last time I was on Software-Architecture im Stream, we were talking about that book and about patterns.

And since I’ve written that book, I’ve been looking at how we can use diagrams as code with AI, and even getting into a bit of spec-driven development at the moment as well.

So, still looking at communication, but with different things now.

Yeah, and I’ve also joined me.

So, do you want to say a few words about yourself?

No.

No?

Okay.

So, you’re here because you’re one of the…

Let’s use a time for our guest.

Yeah.

So, you’re one of the experts on diagrams, so that is why I made you join us.

So, that is also the workshop that you gave yesterday about diagrams as code with AI, and you’re available to also do that workshop elsewhere also for customers on site or virtually.

So, how does AI fit into diagrams as code?

Where’s the benefit?

I think a lot of the time at the moment, people are trying to find the problem to fix with AI, but it can actually be useful for our diagrams.

But we still need to have that skilled human actually driving this, because anyone can ask an LLM to create a diagram using things like PlantUML or Mermaid or Structurizer for C4 diagrams.

But you need to understand what’s coming out of it, and you need to understand what you’re asking for as well, be that, can you update this diagram, create it, or help me fix it?

You need to know how to ask the question, because if you ask things at too high level, very often it will just hallucinate and make things up.

And I’ve got a couple of good examples of that when I teach the workshop.

So, in the past, I tried to ask the LLM to create a diagram and it used Delhi or another diffusion generator and the results were ugly.

So, you just mentioned PlantUML and Mermaid and Structurizer.

Can you go a little bit deeper into detail?

What’s the trick?

Yes.

So, when we’re talking about diagrams as code, we’re talking about creating diagrams using text.

And there’s lots of domain-specific languages.

So, PlantUML is one of the most more popular ones.

And it’s, as you can probably guess, it was originally made for UML diagrams.

So, it supports lots of UML diagrams, but it does support some various other ones as well.

And it’s got some downsides.

Part of what I teach in the course is I give all three of these.

So, I give a grounding in PlantUML, Mermaid and Structurizer, because they’re all useful in different ways.

And so, you can’t just say, oh, we’re just going to use Mermaid in our organisation.

That’s all we’re ever going to need, because it doesn’t cover everything you’re going to need.

So, in some contexts, you will choose one, and in some contexts, another.

So, we look at all the pros and cons.

So, PlantUML is built on Java, which makes it a little bit more difficult to then render the diagram.

So, that’s why things like GitHub don’t really support that, because Mermaid’s built on JavaScript.

It’s then a lot easier for that to be rendered in various places on the internet.

And so, we look at PlantUML, Java, Structurizer, and Structurizer is just like C4 diagrams.

And so, you might think, oh, wait, but I can do C4 diagrams in PlantUML and in Mermaid.

So, why should we bother with Structurizer?

But Structurizer is a bit different in that you’ve got a model behind it, and you create lots of different views of that.

And so, it’s much easier to maintain and manage that data.

If you add a new service in or change the name of something, it will update in all of your views that you’ve created from that.

And you can actually export it to things like PlantUML.

That’s a very good point, because in the past, I always said an architect should model and not draw diagrams.

And you say the big difference, a huge difference between PlantUML and Mermaid versus Structurizer is that with PlantUML and Mermaid, I draw diagrams, but don’t have a model in the background.

Yes, but they are, because they are text, they’re much easier to maintain than a drawn diagram if you’re using, say, Draw.io or Visio or anything else.

To maintain those diagrams, you need that specific software.

If you make a change, you probably have to fiddle around and move things around in there.

Whereas with the diagrams as code, you can just change a word or add something in, and it will automatically kind of move things around for you.

And you can even, you can keep your diagrams with your code so that you have them in the same place.

But also, you can actually diff your diagrams and see the differences, because they are text.

Whereas, I mean, something like Draw.io does produce a sort of XML style sort of file, but you can’t diff that, because it changes in very odd ways.

You can’t see what the difference is.

So, I mean, it sounds to me as if we are basically programming diagrams in some language, and we use LLMs to support us.

So, is that sort of the idea of your workshop and of how you’re using AI there?

Yes.

So, we need those skills and understanding, but then we can use the LLMs to help us with that.

So, maybe we’ve got something in PlantUML that we want in Mermaid.

You can say, give me this PlantUML diagram as Mermaid.

And one of the things that I think might be particularly useful if you, say, wanted to create a full sort of workspace in Structurizr, and you’ve got 100 or even 20 PlantUML or Mermaid C4 diagrams, you could say, here are all my diagrams, create a workspace from them, or better still, manage your context and give it sort of the main ones first and then add extra ones in.

It does vary as to, you’ve got to experiment with the different models as to which will actually know enough about things like Structurizr.

But you said, I mean, obviously LLMs have hallucinations or tend to create hallucinations.

So, is it really useful or what’s your experience?

So, how useful is it to have an LLM do these kinds of conversion work?

Because, I mean, if there are too many hallucinations, you’re better off doing it yourself, maybe.

Yeah, it varies depending on what you ask it to do.

So, if you, one of the examples I use is I’ve got an activity diagram and there are, the sequence means that there are a couple of items that are on the left-hand side.

And if you want to move them to the right-hand side, it’s a little bit of work to change things round and reorder things.

If you ask the LLM to move them from the left to the right, it will go and make up some random syntax.

I’ve seen it do this twice where it said, oh, you can use this little arrow and it will shift stuff over.

And you look at the diagram it produces and it’s just got these little funny arrows in front of your text.

But if you say to reorder the elements so that these two items will appear on the right, it can do it.

So, you have to have that knowledge of what you want it to do and be very specific about it.

Okay.

And you talked about how you are educating people with the skills they need to use these technologies.

So, I think I sort of took a note that the people need to be aware of hallucinations.

That’s one skill.

Is that true?

Or what kind of skills are you trying to teach the people in that workshop?

So, I’m trying to give them a few different skills.

The one is to understand the pros and cons of the different diagrams as code.

So, I chose PlantUML, Mermaid and Structurizer because they’re some of the most popular ones.

There’s a few more out there.

There’s one called D2, which is quite young.

So, a lot of people won’t have used that one yet.

So, it’s worth looking at other ones as well.

But these ones being the most popular, a lot of companies are already using probably one of these.

And so, you can go back to your company and assess, is this the right thing to do?

Could we add something in that would help us with this as well?

And so, we’ve got that, okay, what’s the best tool to use in my particular situation?

When people say, oh, but we only use Mermaid here.

You can’t use PlantUML.

You can say, you can actually back up your argument and say, we need to use it because I can’t use Mermaid for this.

Right, okay.

Like if you’re doing C4 diagrams, you can use PlantUML.

That’s quite good.

Mermaid really doesn’t work very well for C4 diagrams at the moment.

So, you may be thinking, okay, if we’re not going to use Structurizer, we should probably use PlantUML for that.

Or they’re all open source, so we can all go and fix it.

So, you already mentioned that different libraries support different diagram types.

So, this could be one criterion to select a library.

But you also mentioned that a Mermaid is supported by web-based frontends.

I guess it’s just a frontend library and will be rendered on the client side.

So, do you have some more criteria to choose the right library?

What would be your first approach to select?

Yeah, there’s quite a few different things to look at.

So, as you said, what type of diagram do I need?

And sometimes you say, I definitely need a sequence diagram for this and there’s nothing else that I’m going to look at.

But maybe you think, oh, wait a minute, maybe I could use a flowchart.

So, you can look at the different types that are supported.

Maybe you need something a bit more complex.

So, you need to look at an activity diagram, which is something that PlantUML supports.

So, that’s like a flowchart plus plus.

It’s got extra things that you can have.

You can create forks so that you’ve got different things that are happening and they don’t have to happen in a certain sequence, but they all have to happen before you can then move on.

So, if you need things like that, then you know I need this specific diagram.

The other thing to think about is what’s your organization using now?

Because if more than one of them support it, then it’s probably better to go with what people know and what will fit in.

If you’re using Markdown or something that supports mermaid diagrams, if your knowledge management supports rendering of mermaid diagrams, then that’s probably something to look at.

So, there’s lots of different things and I go into more detail in some of the pros and cons.

You can even look at the – they’re all open source, but they’re all different licenses.

So, maybe your company says we can’t use anything with this license.

So, then that’s ruled out, isn’t it?

There’s an awful lot to look at.

So, any more skills that come to mind that you want to teach people or that are important to use diagrams as code with AI?

So, in my book Communication Patterns, I teach a lot of different patterns and anti-patterns for getting diagrams, the whole part ones about diagrams.

So, getting these diagrams to be understood by people.

And actually, diagrams as code can help with a lot of that.

So, it will automatically render things in a certain way and it allows you to put in things like a title.

It has ways of putting in all the labels and things, but it doesn’t force you to.

So, you still kind of need to know like, yes, I should be definitely using labels.

Yes, I should be using a title.

The one big thing that especially mermaid falls down on is having a legend or a key.

And you can tell that I’m big on legends.

I’ve got a t-shirt for it.

And the fact that mermaid just doesn’t have this feature at all is quite ridiculous in my mind, really.

Plant UML does have a legend.

It’s not easy to use to define what’s within that legend.

Structurizer does automatically create a legend for you.

So, that’s really good.

But yeah, I think we really need to sort out the lack of legend in mermaid.

So, if anyone wants to contribute to the open source and sort that out, that’d be great.

Sorry, just as an addition, it also says asynchronous on your t-shirt and datastore.

So, that’s the legend.

And, you know, it’s about play, obviously, just for the people who are listening to the podcast.

And I understand that you even have more ideas about t-shirts like that.

So, can you get it on Spreadshirt or somewhere?

No, at the moment.

But if people want me to put it on there.

OK, sorry, but I was interrupting you.

So, you said that Structurizer always creates a legend.

But I think it has an easy position because it’s always the same color code.

And I think there are only two shapes.

And so it’s quite a few shapes, actually.

Really?

You can define lots of different shapes.

So, you’ve got the standard person and rectangle for like a service or a container.

Once you get down to the components, it has a different shape for that.

You can also use.

So, there’s the datastore shape.

There’s a pipe shape if you’ve got a queue and things like that.

And there’s actually quite a few extra ones which you can use in there.

But I think the thing that the reason it can just auto generate is because it has that model.

So, it knows what’s in this diagram and it will give you doesn’t just give you the full legend.

It will give you the very specific to that diagram.

So, it’s only the stuff that’s in that diagram.

It also knows how you’ve styled it.

So, the sort of very traditional C4 is the different blue colors.

The actual one of the things that I talk about in my book and in some of my talks is considering people who have say color blindness and the actual sort of traditional grays and blues color.

I think one of the grays and one of the blues are a bit too close to each other.

So, but recently, Simon Brown has been saying to people, look, you do not have to use that color scheme, use whatever you want.

So, if you go to the online Structurizer DSL, where you can, it’s like a playground, which is what I use in the course.

It will randomly choose different themes now.

So, you might get orange or you might get pink or things like that.

But you can define your own styles in there in a sort of similar to CSS.

And so, it will just go look at the model, look at the how you’ve styled it, and just generate it for you.

So, it’s that whole model thing.

It’s leveraging that.

We did an episode on Structurizer and the C4 model with Simon Brown, who’s the original inventor of that.

So, I will include a link in the description to that one, or you can look it up on the webpage.

Maybe we should, so you did talk about sequence diagrams and activity diagrams and so on.

And I think people probably have a rough understanding about that, that there is some flow.

And that’s basically what also the name suggests, right?

I mean, sequence diagram actually says what it is.

I was wondering whether you want to say a few words about C4, because, I mean, C4 doesn’t really say what it’s about, right?

Yeah.

So, C4 is about expressing the structure of your architecture in your system.

And so, originally, it started with the context being the highest level.

There is now a landscape level, because the context diagram is, this is our system that we’re interested in, and this is how it interacts with things.

Whereas the landscape level is more, these are all our systems within our organization, and how they interact with each other is a bit at a different level.

So, that’s out of the C4.

So, you’ve got context, then you go down to container, which is kind of individually deployable units.

And so, what you’re doing there is you’re taking that system that you have in your context, and you are kind of zooming in on it and seeing what’s within that system, but still how those pieces interact with things outside of that.

Then you carry on down to the component.

If you need to, I always say to people, you probably, most of the time, only really need those top two.

But you can go down to the component level.

If you go down to the code, that’s when you’re actually going to be using UML diagrams.

And I would say to people, look, the lower down you go, the more stuff will change.

And so, if you don’t need to go into that detail, don’t, because then you don’t have to maintain it.

If you really need to document your code, maybe it’s an API interface, then if you’re using things like OpenAPI, that will then automatically do that for you.

So, you don’t want to have to manually sort out stuff in detail.

And that goes for documentation, too, not just diagrams.

Go into the amount of detail that you need.

If someone then asks questions, then you can go into that detail, but you will have to maintain everything you create.

Yeah, I think that’s such an important point, because I guess some people think that the more documentation, the better.

But as you pointed out, documentation is also a burden.

So, I think that’s very important.

What I found interesting when I was looking at the slides, I have to admit that I didn’t take part in your workshop, but when I took a look at the slides that you gave me access to, it seemed that most of the diagrams that you’re talking about are sequence diagrams, activity diagrams, so diagrams that talk about the dynamic behavior of the system, and CIFOR is different.

So, I was wondering why you chose sequence diagrams and activity diagrams in particular, and not the UML diagrams that you could use for the structure as well.

I mean, there are component diagrams and class diagrams and these kinds of things, so that’s basically what I was wondering about.

Yeah, so I try to give a bit of a balance because we do look at the CIFOR, so that’s structure.

Obviously, when I teach this for one day, I can teach it for longer, and we could go into a lot more if people wanted to, but in one day, we can only really cover a certain amount.

So, because we’re looking at the structure with the CIFOR, I chose the more behavior ones because those are the things that CIFOR doesn’t show, and so a lot of people say, oh yeah, we can just use CIFOR for all our diagrams, that’s all we need, but actually, we also need to communicate the behavior of the system, which is kind of more important, really, because that’s what’s meeting the needs of our users, is the behavior, not the structure.

Our users don’t care if we’ve got a monolith or which service talks to which, they care about the behavior of the system and whether it does what they want.

So, we can’t just use CIFOR, we have to use behavior or at least consider our behavior, and of course, diagrams are a good way of communicating that.

So, is that sort of your default approach, like you use CIFOR for the structure and then use activity diagrams where they fit in, or is that just to give like a broad spectrum to choose from?

Yeah, I mean, I wouldn’t necessarily always use an activity diagram, but they’re quite useful, along with flow diagrams, sequence diagrams, there’s so many different types of diagram, it depends what you’re trying to communicate.

One of the diagrams that Mermaid does is called a Sankey diagram, which I don’t know if you’ve heard of, but I don’t know if you’ve ever seen these websites where they show maybe the flow of where users are using a website.

So, they’ve started at the home page, and then some of them have gone to this page, and so you get those lines, and that could actually be quite useful for showing lots of different things in the system, not just how users are.

Is it where the lines have different signals, depending on how many people are going in these different ways?

So, you could use that in lots of different ways, not just how people navigate a website, but how people are using a system or how your services are communicating with each other.

And if you created that, you could see where the bottlenecks are, where people are really using things.

So, to me, it seems what you’re basically saying is, okay, so I want to do some diagrams, so I would use some tools, like you mentioned, like diagrams as code tools.

And would you suggest to only do that with AI support, or do you think you can?

I mean, there are studies that say if you look at code and coding, it’s not obvious whether using AI technology actually improves things and makes people more productive.

There are some studies that say that people think they are more productive, but in reality, they are not.

So, what’s your suggestion?

Would you use diagrams as code only with AI?

Do you think it’s a strong support or is it more optional?

What’s your take on that?

I think it’s definitely optional.

And in fact, when I teach the course, I say to people when they’re doing the exercises, you can use AI if you want, or you can hand code this.

One of the examples I show is a really long prompt saying what I would like to show in a sequence diagram.

And if you look at how much text you have to actually give it, you may as well have just written it in plant URL or mermaid from the start, it would actually be less writing if you just typed it.

So, that’s why I’m teaching these basics and saying to people you need these skills because you need to know when it’s actually worth using the AI and when it’s not.

So, if you are really struggling with an error, if you are thinking, I’ve got a load of plant URL diagrams and I want to convert them to mermaid or I want to convert them to structuriser, that’s knowing when it’s worth, you’re going to get that time saving, but also knowing what to look for the hallucinations and errors and things like that.

Like when it says, oh, yeah, you can just use this keyword to move stuff around.

Yeah, no.

So, as far as I remember, each diagram type has a different domain specific language and some are more complex and some are easier.

Would you say that some diagram types are easier to handle with AI than others?

And I mean, for instance, with a sequence diagram, I guess there’s not much layout information in there because it’s just a sequence.

But with other diagram types, I run into problems with the layout.

Is this maybe a point where the AI can be of good help?

Yeah, I mean, I think as long as the AI kind of has the context of understanding that if it’s been taught on the DSL, the domain specific language, you can, of course, use things like MCP servers.

I think there’s one called context seven.

I think it’s called that, which I’ve been told is good for coding, but also it does have information in there on Mermaid and Blank UML.

And someone in my class yesterday actually said they got really good results.

They thought it was probably because they were using that MCP server.

So as long as AI kind of understands it, it can probably help you, but it may still try and make things up because it’s trying to answer you.

And if it doesn’t understand something, then it will just still try.

So in your slides, I’ve seen that, I think it was the activity diagram where there’s an old DSL available and a new one, which is still in beta.

Is this such a case where the LLM might not know the new version yet?

And so it might be better to use still the old domain specific language.

Yes.

So with these, although these diagrams as code languages have been around for a while, they are, of course, still being added to.

And some of them are in beta, especially with Blank UML and Mermaid, they are adding new ones on.

Mermaid probably a bit more than Blank UML now.

And yeah, they’re not going to know about those ones that are in beta.

And sometimes they do change things.

There was one, I think it might be in sequence diagrams with Blank UML where they just changed it from one to the other.

But I might be thinking of something else.

So yes, your LLM is going to have a cutoff point where it doesn’t know about things.

And so you’re going to have to add in things like an MCP server or just say, look at this documentation, or this is this is the basic structure that I’m expecting you to use.

Now, do this.

So it’s going to your results are going to vary depending on the model you’re using, depending on the training data, and depending on how specific you are when you what you ask it to do.

Which model do you use most often?

I have started using Claude more recently.

But I’ve used ChatGPT a fair bit as well.

Claude seems to be at least as good as ChatGPT in most things.

This is without supplementing it with an MCP server or anything.

But it’s not perfect.

I gave it a Blank UML file with a load of errors in, and it didn’t pick up on some of them.

It thought that a bit of gibberish that I chucked in there was supposed to be a color code, but started with a hash.

And it started with two hashes.

And then it tried to change it, but it still had two hashes.

And I was like, that’s complete nonsense.

So none of them are perfect.

I think they just released a new model.

So we have to try it out.

Yes, that’s that’s what I do with this course.

Every time I teach it, pretty much I’m reviewing it and saying, right, are there any new diagrams to talk about?

How is this model doing?

And I do a few comparisons in there.

I’m like, this is how it happened, what happened in September last year.

This is September this year.

And sometimes they actually get a bit worse, because of course, it’s non-deterministic.

And I’m just saying, right, do this now and seeing whether it can do it at that particular time.

It’s a bit like sitting an exam or something.

You can work really hard for two years, sit the exam and not do very well on that day.

It was a quite interesting part in your workshop that you compared the results from September 24, something like this.

And now it’s better.

So this gives you more insights.

That’s quite good.

Yes, I was saying when I was teaching yesterday that people in that group were actually getting better results than people when I’ve taught this before.

And I was saying, well, maybe the fact that I’m teaching this is actually teaching these LLMs to do it better.

Yeah, that’s quite an interesting one to think on.

Which leads to the problem that you can’t like pin a version of an LLM and say, OK, this is going to be the same model, even if I use it in a few months.

So before we went live, you said that there is a relation between what you were doing in your workshop to spec driven development.

So what’s that?

The spec driven development is quite a new thing, and there’s lots of different definitions for it at the moment.

But essentially, people are now sort of moving.

It’s quite funny.

So I would say that developers have spent sort of the last maybe two decades or more trying to avoid writing documentation so they can write code instead.

And now we’re talking about writing documentation so that I can write the code.

And other people have been saying, oh, yeah, how many people like writing code?

And everyone goes, yeah.

And how many people like reviewing code?

And everyone goes, no, I don’t really want to do that.

But actually, a lot of people are saying, yeah, let’s do this spec driven documentation where that’s exactly what we’re going to do.

We’re going to the basic idea is that you create a specification of a spec using things like Markdown.

And that’s partly a human input and partly an AI input.

And there are certain methods and tools that you can use at the moment, like GitHub spec kit.

I’ve been looking at that a little bit.

But the idea is that we create the spec along with the AI and then the AI then writes the code for it.

And then we hopefully review that code.

Because otherwise, we’re just doing vibe coding.

What do you mean by hopefully?

Yes.

So, we should definitely if we’re doing spec driven development, we are reviewing that code.

We are reviewing the tests that it’s creating.

But the idea with spec driven development compared to vibe coding is that we are managing the context that the LLM is using.

So, we are saying these are the principles you should be sticking to.

These are architecture decisions that have been made.

We need you to record architecture decisions which we can then view.

Because we one of the things I’m really big on is architecture documentation should include the why.

Because that if we don’t include the why we made that decision, then we can never reproduce that.

The how and the what can be reproduced from the why, but the why cannot be reproduced from the how or the what.

So, that is the information that’s being lost.

Okay.

And diagram as code, how does that fit into spec driven development?

So, there’s the relation?

Yeah.

So, you can, of course, in your markdown files, you can include these diagrams as code.

You can ask the AI to produce them in there as well.

One other thing I’ve seen being done is to create ASCII diagrams as well.

So, you’re getting the AI to use pipes and all the different ASCII characters to actually create a diagram for you.

And it can do that in the markdown.

But it can also, of course, do that in the command line for you as well.

And you can say, like create it using ASCII show what we’ve done so far.

And that helps you to review what you are doing, because the diagrams aren’t necessarily for the AI.

But if you’ve already got a diagram, you can give it to the AI to help it understand what we’re trying to do, or they understand.

But also, if you’re getting the AI to create the diagrams of what you’re doing, you are using that as a validation check.

You can say, okay, that’s not quite right.

We need to make a change to this.

That’s a very interesting point, the ASCII diagrams, because when you create a plant ML diagram, you have the relationships between components.

And I guess that a lamb can read this fine and okay.

But I noticed lately that Claude often creates those ASCII diagrams.

And I get an impression that it would, I mean, visually, it’s easy to read for me, but I guess it’s not so easy to read for the LM.

And so I have, I guess, the documentation has to contain the relationships and the diagram.

Did you also notice a shift in there that the model behaves differently?

I’ve not really noticed that, but I think like what you were saying about the ASCII diagrams, yes, they don’t have those relationships.

One thing that I’ve seen that I thought was good for this was it was used to actually mock up how a web page was going to look with some different boxes on it.

And so, of course, there are, I think it’s, I think it’s Mermaid has at least one form of diagram for wireframing, but just using those ASCII to go, right, yeah, we’ve got a box here and then a box here, and then that’s good.

And you can show the different versions of something or how it’s going to move through different pop-up boxes and things like that.

And so if you’re wireframing, I think where you don’t have those relationships, where it’s sort of more of an image, then that’s quite a good way to do it, because the command line, you can’t, you’re not going to have sort of the image unless you actually load up a web page and have a look at it.

So the ASCII diagrams that we’re talking about, that’s, they, I assume they use ASCII characters to sort of draw a diagram.

So that’s, that’s how it works.

Yes.

Yeah, and LLMs might have problems understanding that.

I was wondering whether, so that’s it for spec-driven development.

And I think that is something that we see in LLMs quite a lot these days, that basically we are trying to use not natural language, but other languages that also have a limited, limited, limited set of things that they can talk to.

Any other relations between diagrams and AI?

How should I put it?

So the workshop seems to talk about how to generate diagrams using AI support.

So we have spec-driven development where it’s used as an output, but also as an input.

Is there anything else, any other relation between diagrams and AI that you’re seeing at the moment?

I don’t think I’m really seeing anything at the moment, but I think they’re going to be useful with spec-driven, because a lot that I’ve seen with spec-driven hasn’t really talked about using diagrams as code, which is interesting.

It’s more about creating just everything sort of in text.

And from what I’ve seen so far, a lot of the tools at the moment are, interestingly, very waterfall.

So they kind of, they don’t assume that what you create first is wrong, which is, of course, if you’re being agile, you’ve got to assume that you are going to have to iterate on things.

And so if you create something in these, like all the tools work with different workflows, but they don’t really want you to go back and change things in these workflows, which seems a bit odd to me, because it’s like, oh, were these tools written 20 years ago?

So I’m interested to see how these tools are going to evolve.

At the moment, they’re all doing very specific files, very specific workflows.

And I think there’s probably quite a lot that’s missing from them.

And I think diagrams as code is one of those other sort of inputs that we might want to put in there.

Like if we’ve been using DDD to work out our different domains and different contexts and things, how are we going to communicate that to the LLM?

And so we need to think about how we how we’re going to do that and how we’re going to break down all these files into small chunks so that we don’t overwhelm that context window for the LLM and work on very small things and be able to iterate these things.

But yeah, I think it’s this is a very young thing, spectrum and development.

And it’s I think it’s got a long way to mature before we can really use it properly.

I wonder if this approach is more helpful for the LLM to have the relationships as a specification or whether it helps more us humans to review the spec because we are visual beings.

And yeah, it’s easier for us to review a diagram.

But then I sometimes wonder whether it makes sense to follow all those lines to check whether the diagram is correct.

So, yes, of course, you’ve got to rely on the fact that the diagram is correct.

If you’ve said create if you created the diagram yourself and given it as part of the spec, then yes, the diagram is what I want.

And then you’ve got to review the code to make sure it’s followed the diagram and the specification.

But if you’ve asked it to create the diagram, maybe this is just kind of it’s giving you what you want rather than the actual reality of it.

So say you have created this whole spec and you’re saying, OK, this is what I’ve given you.

Show me a diagram of what you’ve created in the code and it diagrams what’s in the spec.

But the code, if you haven’t checked it, might be completely different and some complete mess.

And because we all know at the moment, even if we aren’t using AI, we have got these beautiful architecture diagrams and things and specifications.

And then we have the code, which is completely different to that.

So we can try and use things like architectural fitness functions as sort of tests to make sure that things are as we want them.

And so I think that’s probably something to try and use with with the spec coding as well.

But we can’t trust that the LLM is telling us the truth.

Yeah, which basically.

So I think that’s that’s a very important and good point.

And it’s for real.

I mean, the things that you mentioned, they do happen as hallucinations, which yeah, which leads to the.

Is there a broader problem?

Like how do you how do you solve the problem that diagrams might be detached from reality?

I mean, shouldn’t you somehow create them from the original code then and use that as as diagrams?

Yeah, so that’s one thing that I think might be helpful for in that if you are give it some code that it hasn’t written and say, like create a diagram of how this is all interacting, then that’s probably more likely to be correct.

It might not be.

But you can then say, OK, here’s that diagram.

Here’s the diagram that that we created originally and actually compare them.

Because as you were saying, we’re we’re good with these visuals as humans.

And so using AI to review things that it hasn’t created or the pattern of using one model to create something and then another model or agent to review it and check that the spec and the code actually match.

That’s another pattern that we can use.

So when when you try to compare them visually, I remember when working with Plant2ML, there are diagrams where the layout is trivial, like activity and sequence diagrams.

But for instance, component or class diagrams, where the algorithm tries to place elements in such a way that the lines do not cross.

And I could imagine that when I work with those diagrams and add something that everything flips over.

What’s your experience with this?

Can we stabilize this?

I think Structurizr has a solution for this.

The manual layout, something like this.

Yes.

So Structurizr has a manual layout.

If you’re doing some of the different types of diagram in Mermaid and Plant2ML have some ways of specifying things.

So if you’re creating a C4 diagram in Plant2ML, you can use rel U for a relationship up, rel D for down and left and right.

So you have some sort of control.

The order that you put items into your code will also determine where they appear as well.

And if you’re doing things like the activity diagram, if you are adding in branches, say yes and no, the order that you put that in will determine where things are.

So you’ve got some control, but you can still end up even in Structurizr when it’s doing the auto layout for you, you can end up with some very odd things where like one thing’s up here and it means all the lines are like this and crossing over.

And you think, well, if you just put that down there, you wouldn’t have that problem.

So yes, these tools do try, but you get to that point where you think, I can’t line that up.

Or you get annoying things like I really like things to line up and like the title is supposed to be in the centre, but it’s actually slightly off centre.

It’s just annoying.

But yeah, with Structurizr, there is manual layout.

But the downside to that is that when you then change the model, that manual layout is probably going to need to be changed as well.

If you add something, that new thing is just going to appear in the top left corner.

And that might be covering something else up.

And so if you do add things or change the model, you’ll have to review your manual layout.

So I would say to people only use manual layout when you really need to and have some sort of process, be it manual or automatic, where you will go and check your manual ones when something has changed significantly in the model.

So you were talking about how you can sort of review things that one element puts out and have that reviewed by another one.

And I mean, we had cases, as you mentioned, even on the stream, like publicly, we had somebody who basically said, okay, he was white coding that stuff, some stuff, and he asked the LLM to create more tests and got suspicious and figured out that, in fact, the tests didn’t do anything.

They just generated output that looked as if some test was running.

Now, and I mean, obviously, if some human would do that, I would basically fire that person because it’s just, you know, you can’t trust that person anymore, right?

Because it’s really a bad thing.

So what you’re saying in a way is, okay, so I have that LLM that I don’t trust.

And now I have another LLM that I use to review that.

And I’m wondering, do you have any experience with that?

Is that something that really works in practice?

Because that other LLM could also be as erroneous as the original one?

Yes, it’s not something that I’ve done that much.

But I have heard that if you and so the problem with the context windows, and then sort of overflowing and things, if you create agents that have a very specific task, then you will get much better results.

So if you have one that is just there to go through all of your typescript, and one that makes sure that certain things like architecture decisions have been written, and so it’s not so much, I’ve got one LLM that does everything and one that checks everything.

It’s I’m using different agents to do very specific jobs.

And then you still can’t guarantee.

The thing is, you can’t guarantee that a human would either to be to be honest.

I mean, obviously, you build things, you build up levels of trust with humans.

And like what you were saying about different models coming up, coming out, it’s like, Oh, I’ve built up a level of trust with this model.

And now there’s a new one.

And it’s a bit like someone leaving your company who you really trust and someone new coming in, who kind of says, Yeah, here’s my CV, all flashy.

And you think, well, now I’ve got to build up this trust again, with you and work out how to work with you.

So it’s, it’s quite interesting dynamic.

I wonder whether Eberhard will fire Claude Sonnet 3.7 and then use 4.0.

I mean, it is a new model, it could make sense is that it’s more reliable, and you can build up more trust.

Yeah.

But like you were saying with Simon Wardley, I’ve, I’ve heard him talk about those tests.

But if you if you ask the things at a very high level, which I think is what Simon did, where it’s like, yeah, create tests for this.

It’s like, well, what’s the LLM gonna do?

It’s gonna, it’s gonna do what like some teenager would do probably, as well, which is create a load of tests, which basically just return true.

Because that’s the easiest thing to do.

And you say to it, Oh, fix this bug.

And, and what instead of doing the hard thing of fixing the bug, it will change the test so that it returns so that it passes.

So it’s just it’s trying to do what you want in the kind of the easiest way possible.

Yeah, I have to because you came up with with that, that trust thing.

If I may, there are two things that I would like to point out.

First of all, I, I have the impression, but maybe maybe your precious different, that there is too much confidence on LLMs and LLMs actually implemented in a way that they that they try to gamble us to trust them.

So I think that’s that’s one problem.

And the other problem is, I have to admit that that I had to make my mind about like trying to figure out what exactly the problem is.

So the problem is that, in my opinion, an LLM is optimized just by the tests that they that are run, they are punished if they say, I don’t know, and they never do that.

So therefore, they come up with some answer.

And humans, people that I would like to work with, I can’t really think of anything that is worse than a person who never says, I don’t know, and wouldn’t show up and say, okay, you know, so you gave me that task.

I don’t know how to do it.

Please help me.

And so therefore, I think it’s it’s a different because of that it’s it’s a different way of trust, or a different thing if you have a human.

And that is also why why I feel somewhat uncomfortable about LLMs because they are optimized to sort of gain trust, but at the same time, they’re optimized to, you know, to just give random answers.

And with a human, I would be very scared to have such a person.

So and I have to admit that I would probably not hire such a person.

But you’ve got there’s a lot of companies out there still that don’t have these safe working environments where people feel that they can actually admit that.

But it’s interesting to think that like people put all these trusts in these kind of personifications that have been kind of created, like they given names like Claude.

But the thing is, like the very underlying large language model has basically been trained to play a game of guessing the next word.

And so it does that based on what it’s been trained on.

And so all it’s doing is playing, playing this game.

And a lot of people just don’t really kind of understand that all the LLM is trying to do is guess the next thing.

And basically be told, yes, well done.

You’ve done that.

And so it’s not going to say, I don’t know, because it’s basically trained to, to do that.

And so that’s all all it’s doing under the hood, we’ve got an agent or something that is communicating with that LLM.

And all that’s doing is going, right, okay, now here, now the next word, now give me the next word, now give me the next bit.

And it does that over and over again, and then passes that back to the user.

And I think if people understood more what was under the hood, then they wouldn’t have quite so much trust in it.

Anything we still need to talk about?

Anything that we forgot to mention?

We’ve covered a lot.

Yeah, that’s true.

That’s also what I figured.

So thanks a lot for joining.

So this evening, we are going to have a live stream of the Fishbowl and where we are going to discuss the impact of AI on software architecture.

So yeah, I’m looking forward to that one.

So please, yeah, join us there.

It’s at a quarter to 7pm.

So a CET.

So to see you there, and hope you enjoy the rest of the conference.

And thanks again to Software Architecture Gathering for hosting us.

And thanks for watching and listening.

Thank you.

Episode 291 - Diagrams as Code with AI with Jacqui Read | Transcript

Software Architektur im Stream