Why this interview?
Kevin is a Consultant Solution Architect with a long experience, he worked for Intel, eBay, Realtor.com, Microsoft and now is working for Data View LLC. He has a lot of very interesting experiences and we talk with him focusing in particular on migrations and, in particular about his specific experience with migrations.
So, the approach they took was they had someone who actually worked for the company that produced this database environment, the programming language, so knew the internals quite well. And what they did was they wrote a program, Augmentor, I suppose. It’s the best that searched for insert statements, or create statements, update statements, delete statements, and wrap them in what they called an XML enabler. Happily, the programming language had a save as XML. So, it would take the current record that it was operating on, and as it committed it, it would render that record in XML and drop it into MQ series, message queuing.
You have to think of your data as now in an ecosystem. It lives in more than one place. Over here, we have the system of record for programs that are not yet migrated. Here now is the system of record for programs that have been migrated. Of the system that are not a record for that particular piece of data need to know about this data. We had to build quite a large thing.
Personally, I think a good target to aim for is in and around the 70 something percent. 80-20 is a nice number, but I haven’t gotten there yet. Now, that may be because as part of this exercise, as we’ve discussed earlier, I had to write two different transpilers. And it could be the weakness of the transpiler creator that 80% was not reachable. Nice goal. 70%, 75% should be the target. You could say 100% should be the target, but it actually isn’t.
Federico Tomassetti:
Hi, Kevin, thank you for being with us today.
Kevin Mackey:
Federico, I’m pleased to be here.
Federico Tomassetti:
Good. Well, I will start with a very simple question and is just asking you if you can tell us something about yourself.
Kevin Mackey:
Okay. Kevin Mackey. I’m native Irish from Dublin. Lived in Spain, lived in France. And for nearly 30 years, I am now living in the United States. 20 some in the San Francisco Bay Area where I was fortunate to work at places like Intel, eBay, Realtor.com, Microsoft actually, their Silicon Valley campus.
Kevin Mackey:
As a consultant, most of the time, in fact, I was consulting to Microsoft’s internal IT. So, I’ve been fortunate to work with large technology companies. Then almost 10 years ago, I moved to Kansas City in Missouri.
Kevin Mackey:
A boy who grew up on an island and then move to the edge of the Pacific Ocean now lives very close to the geographical center of the US. Apparently, I needed a change, and I’ve continued consulting here.
Federico Tomassetti:
Nice.
Kevin Mackey:
Largely because I don’t like working for companies anymore. As a consultant, you get different challenges. And apparently, I get bored easily. The longest I ever worked was for Digital Equipment Corporation, which also tells you maybe, “Okay, this guy is old. That company is long gone.” I’ve worked for them for 12 years, and largely I’ve been consulting ever since.
Federico Tomassetti:
I’m very happy to have an Irish guest, because I used to live in Ireland and I really enjoyed my time there. Everyone was really wonderful with me. So, always happy to connect with someone from that.
Federico Tomassetti:
Okay. We have seen that you have a lot of very interesting experiences. And today, we are interested in focusing in particular on migrations. Can you tell us a bit more about your specific experience with migrations?
Kevin Mackey:
Yeah. There’s been a variety of them. I helped to migrate one company a couple of years ago just as the pandemic hit from an Oracle OBIEE environment over there. They were going full SQL Server, and Power BI, and Cubes, and so on.
Kevin Mackey:
As I hate to do the same thing more than twice, at least, I started generating the code that would build the Cubes for them and help build their ETL process. That was a relatively straightforward … Okay, they had these reports, they need them in Power BI. They had a Power BI expert, so long as the data could be provided. That’s all with that.
Kevin Mackey:
The much larger one, which has taken … That one took months to do. The larger one, which is a mainframe modernization is taking years, because for a variety of reasons. There’s over 30 years of code. There’s that, plus however many more of data. The company in the transportation logistics area have had and still have a mainframe application running on top of z/OS. However, on top of that, they have a …
Kevin Mackey:
It’s a programming language come database loosely, database management environment, and file-based as it turns out. They made an interesting choice not to do, “Okay, let’s do Greenfield, let’s rewrite everything in Java. Turn the mainframe off one day, turn the Java on and everything will be just fine.” And that was partly because they no longer had the requirements, the business rules.
Kevin Mackey:
Business rules were built into the code at this point. The people who had written, some, at least, had moved on to other companies. A number of people certainly had retired. So, they had an enormous amount of code, and enormous amount of business data, and they decided, “Okay, let’s migrate slowly.”
Kevin Mackey:
They contracted with a company in New York, that would convert their programming code into Java and decided to do this on sort of application area by application area at a time. However, that required that the new applications, as they came online, would have the data, all of the data that they might need to cope with it. So, they needed to move the data as well, and both systems have to be in sync.
Kevin Mackey:
So, the approach they took was they had someone who actually worked for the company that produced this database environment, the programming language, so knew the internals quite well. And what they did was they wrote a program, Augmentor, I suppose. It’s the best that searched for insert statements, or create statements, update statements, delete statements, and wrap them in what they called an XML enabler. Happily, the programming language had a save as XML. So, it would take the current record that it was operating on, and as it committed it, it would render that record in XML and drop it into MQ series, message queuing.
Kevin Mackey:
We need someone on the other end of this to catch this and persist it, please. Without the please. Which if we were doing it today, we might have used Kafka or something like that. And we might, who don’t know, persist it in the cloud, whatever. They went and invested in a set of SQL Server boxes, and looking at an initial thought was, “Microsoft has this BizTalk thing for electronic data interchange and it knows that XML.”
Kevin Mackey:
However, the company to mount a BizTalk cluster of the size that we’re going to need, the company didn’t have the expertise in that. I had worked with BizTalk within the Microsoft internal IT department for their own EDI purposes. So, we looked at that and went perhaps not.
Kevin Mackey:
Plus, there was the requirement that within three seconds of the record being persisted on the mainframe, it needed to be persisted in the SQL Server, or else the new programs that were coming online would comfortably get out of step, uncomfortably get out of step. So, that was a hard requirement. I wasn’t sure that we could build a BizTalk cluster large enough, because there were over 200 files on the mainframe that were being written for customers for shipping, for quotes, and so on.
Kevin Mackey:
One of the features, and one of the reasons the company chose this mainframe application is there was almost no persistent penalty. As far as the program was concerned, it committed a record. And as it turns out, it committed it to the memory. And the underlying software then ensured that it was journaled and persisted to disk well enough that even if everything crashed, and mainframes don’t often do that, but even if everything crashed, you still had your data up to maybe a second or two ago, which caused some interesting patterns in how the people wrote records. They’d write a basic structure.
Kevin Mackey:
And then as the program found the next bit of data to go into that record, it would write it again, would write it again, would write it again. And slowly over time, over five minutes, or whatever, build the record. Each of those commits caused XML enablement and say, “We have a message, send it.” So, we were getting messages with the same record that were milliseconds apart. Ones of milliseconds, not 10s, or 100s, ones. So, I thought, perhaps BizTalk, I don’t know, may or may not be able to take it, but let’s see.
Kevin Mackey:
So, SQL Server has a queuing system built in service bus, Service Broker already. Okay. That was totally within the database. If you have a queue, it will automatically invoke a stored procedure to dequeue messages per system. “Okay, let’s go this way.” And it had the virtue that most queuing systems have. It was relatively bulletproof. The idea was we could not lose messages, hence MQ series on the mainframe and Service Broker, the SQL Server environment.
Kevin Mackey:
Then it was just a question of writing several hundred stored procedures that knew how to shred XML messages, and persist them into tables that looked awfully like the file structures. And of course, on mainframe are file based systems that got different record types. Therefore, this record type has 15 fields, this record type has 115 fields, and so on.
Kevin Mackey:
We had to present the data in SQL Server, which doesn’t …or in relational database, which don’t really like repeating groups or different record structures within the one table. So, we have to shatter 200 and some files became 460 some tables and so on. A great deal of code needed to be written. And while it was fun doing the first one, and the second one, and the 10th one, I think it was, after a while, the patterns became the same.
Kevin Mackey:
Dequeue a message or dequeue a batch of messages. Sort them by some time that was within the message. Shred the XML, persist. Update the record if it’s there, create a new record if it’s not. If there are recurring fields, which many had, write them into achild table.
Kevin Mackey:
And in the case of, “Oh, we deleted it.” Well, mark it for deletion. Don’t get rid of it yet. Mark it for deletion, because the program at the other end, the Java program needs to know that what used to be there is no longer. That’s when I decided that we needed to look at maybe code generation.
Kevin Mackey:
The only thing that’s changing are the structure of the message. What fields are there, and how many? Some of the files had, as I said, 15 fields. There’s one that has 400 and some.
Federico Tomassetti:
Too many.
Kevin Mackey:
Exactly too many, but it’s the way the data was, therefore we have to make the data the way the data was. That’s when we decided, “Let’s do some code generation.” So, we built an entire environment around that metadata that held all of the information about the tables. Table name was, what the different record types were. “Does this file have recurring fields? Okay, let me have a reoccurs table, and what columns go in there.”
Kevin Mackey:
And once we had that information, we can then generate the stored procedures, which are in Service Broker called activation stored procedures. And messenger writes SQL Server, activates the stored procedure to go do something. We then used the same metadata, because shredding XML in SQL is a terrible idea. It’s capable of doing it, but there are many xpath statements. We were running out of compute power and the tempdb space, which changed that into C#.
Kevin Mackey:
So, we generate the C# code to shred the message, because again, it’s search for this element, search for this element, this element, this element. And the only difference between the 200 and something C# programs is, which elements are we searching for. Always record key, always date timestamp, and then whatever else is in the message.
Federico Tomassetti:
I think you raised two points that I think are pretty important for … Maybe I want to stress them for people that if not experience with migration. The first one is that you have to end these projects, this large migration step by step. So, migrating one piece at a time.
Federico Tomassetti:
And the other one is that you don’t just migrate the code, but there is also the data. And because of the interaction between these two points, the fact that you do things in step, and you will migrate both code and data, there is this maybe entering phase in which you have to keep in sync data in two different places. And that can be very challenging, and maybe one could underestimate these challenges before starting.
Kevin Mackey:
Very definitely. It was when some parts of the project are up and running, it was discovered that, “Oh, and if we persist new data in the SQL Server environment, we may have to route that back to the mainframe.” So, the programs that are in another application area, but still touching the same kind of code, will know about that. Yeah, that became clear possibly a year into the process.
Kevin Mackey:
The first part of it was, “Okay, can we get the data over and ensure that it can occur? Has the first Java programs began to come online?” And then I’m just kind of, “Okay, we need the second half, which is now this is a circle or a cycle.” It’s an ecosystem, which is a word I’ve tried to persuade these people.
Kevin Mackey:
You have to think of your data as now in an ecosystem. It lives in more than one place. Over here, we have the system of record for programs that are not yet migrated. Here now is the system of record for programs that have been migrated. Of the system that are not a record for that particular piece of data need to know about this data. We had to build quite a large thing.
Kevin Mackey:
Mercifully, we did not have to worry about the routing the data back. The people who did the initial XML enablement, there were capabilities within that mainframe environment. So, the Java program was going to effectively post the message back and in HTTP environment, picked it up, persisted it. And they subsequently set flags and said, “Okay, don’t route this back, because this is a system record.” So, then we have to modify our code templates and say, “Okay, if you get a message that shouldn’t be routed further to the system of record, skip it. Don’t do that.”
Federico Tomassetti:
Maybe one question that I would like to ask. It’s obvious from the discussion up to this point that doing migration, it’s not easy. It’s not a final project that you do in one week. So, one may wonder, why should someone migrate a system that is working to another platform?
Kevin Mackey:
There were several reasons. One, I alluded to earlier, 30 and more years of code. And the environments in which that code was written, it is not particularly modern. And the company that supplied that particular database, common language and so on, is de-emphasizing that has they move forward.
Kevin Mackey:
Many new things are not going to be coming there. Two, their programmers were retiring, or have retired, and who go out on Dice, Indeed, or whatever, and say, “Hey, I’m looking for programmers for this prehistoric piece of technology.” And it’s silence. Not a great deal of interest. Or if you do find someone, they would be phenomenally expensive, because, “Hey, sorry, man.” So, there was that.
Kevin Mackey:
Finding new people or moving their application to some adding new 21st century capabilities, much harder to do in the environment they were in. Finding people to do it difficult. And IBM mainframes are very expensive. The new environment. I mean, SQL Server Enterprise Edition on some of the large scale boxes they have, they’re not cheap either, without any question.
Kevin Mackey:
The Docker based application servers they’re using for their Java programs, you buy them in bunches from … A pallet of them for two and a half months rent on the mainframe. So, it’s a less expensive environment. I was going to say cheaper, but it’s a less expensive environment to which they’re moving.
Kevin Mackey:
Those are some of the reasons. Expense, greater facility to add newer features. They could make this cloud-based without a whole lot of difficulty. Just move the containers up to Azure or AWS. IBM has been buying cloud companies, but doesn’t really have the most coordinated cloud environment right now. And the code they currently have not truly cloud capable. So, they could move there with greater facility.
Kevin Mackey:
And if a company wants to get rid of their on-premises hardware, and so on. Harder to do if you’re on-premises hardware fills a room, and it’s a one box. Slight exaggeration. It doesn’t quite fill the room, but still, it is big iron and there’s no question of that. Expense, ease of moving to a more modern environment, and the fact that you may or may not be able to find people who can do the work that has been going on for decades. These are reasons, but yes.
Kevin Mackey:
It is never easy. Sometimes, it’s actually possible. We both have been in environments where that didn’t work. It’s not easy. It’s not inexpensive. But the goal, hopefully, is to enable the company be more … I don’t want to use this term, agile, and I don’t mean in the software development environment. Simply more agile in their marketplace. They’ll have systems that will benefit where they want to be, rather than where they’ve come from.
Federico Tomassetti:
Yeah. It’s probably not cheap to do this migration, but on the other hand, they’re probably migrating systems that are very valuable to them. And they’ve been very valuable for many years, otherwise, they wouldn’t be legacy.
Kevin Mackey:
Exactly. Yes. A number of financial institutions around this area are doing the same thing. Financial institutions have lived in mainframes for a long time, because they work. Systems are powerful. The systems are important and beneficial for the company. But sometimes, then the company’s business model becomes constricted by legacy.
Federico Tomassetti:
I think one question that someone approaching this field could ask is … I mean, if one is to migrate a system from, let’s say, RPG, COBOL, or maybe some 4GL that was briefly used some tens of years ago to a modern language, I guess there are two possible ways to do that. One would be just looking at the old code and rewriting from scratch in Java, C#, Python, or instead of writing a transpiler, or use some other tools that automatically does the translation. What do you think about these two options?
Kevin Mackey:
Option number one, I think you need an army, or you may. I don’t know if it’s known how many billions of lines of COBOL exist out there. Overall, the only programming language for the programmers would get writers cramp typing it, but it was so much.
Kevin Mackey:
The problem with looking at the old code … Okay. Now, write the new one, the new code is that’s people intensive, which may not be a huge problem. You basically hire the entire class of the next CS group out of college. You say, “Okay, here’s the fundamentals of what RPG looks like and what an input spec looks like, calculations. But I used to teach RPG and an output spec looks like … You know Java, C#, Python, whatever. Go, make the translation.”
Kevin Mackey:
The people part of it is possible to get over. The difficulty I personally feel with that is you then wind up with what I call artisanal code, which is great if you’re making pottery. No two pots look the same, but that’s the charm and it’s handcrafted. You lack the consistency that your code is going to need, because one person with the best will in the world doing the same thing several times will accidentally across a week, across a month. Differences creep in, even if you got templates and so on.
Kevin Mackey:
10 people doing the same thing with templates all laid out and so on. “Do this. Do not deviate, we fire you.” You will, at the end of a month, 10 people doing 10 things, you will wind up with at least 40 variations. I’m going 60-40, and that’s probably low. Simply, there will be variations, because we are not very good at doing the same thing time and time and time again without deviation.
Kevin Mackey:
Now, sometimes it’s good to … The 10th might be better than the first four. However, across 10 people, you’ll have four things that are at the 10th level that are better, but they’re not the same. And then when bugs crop up, as sadly they do, it’s a major exercise to find out. “What’s different here, it goes through there.”
Kevin Mackey:
This is part of the reason we went for generating the code that we were producing to shred messages and persistence and so on. Once we had a pattern, then do it 220 times. Just varying a lot of the columns.
Kevin Mackey:
In the case of the original code to Java, again, the approach was a third-party company that had some thing in this area. They had to build themselves a grammar or something. I don’t know, I know nothing of the internals of it. But again, the idea was that we need to stamp this code out, and it all looks the same. The virtue in it is that it is either all wrong. In which case, you fix the transpiler, the template. Whatever assumption you made, that was wrong, you fix it, and then they’re all right.
Kevin Mackey:
It may not be the world’s finest code, but code file number one, and code file number 257 follow the same pattern religiously, because computers know how to do that. So, the preference would be … And it’s certainly not easy. What the preference would be get a computer to do this, or as much of this as possible.
Federico Tomassetti:
Very happy you say it as much of these as possible, because this was exactly the question that I wanted to ask next. That is, how much code do you think can be actually translated? Because I think this is a typical question that someone asked. There are maybe people that have realistic expectations like, “Okay, click button and 100% of the code will be perfectly translated and extremely readable.”
Federico Tomassetti:
Can you tell us, in your opinion, how much code can be typically translated when doing the migration?
Kevin Mackey:
Personally, I think a good target to aim for is in and around the 70 something percent. 80-20 is a nice number, but I haven’t gotten there yet. Now, that may be because as part of this exercise, as we’ve discussed earlier, I had to write two different transpilers. And it could be the weakness of the transpiler creator that 80% was not reachable. Nice goal. 70%, 75% should be the target. You could say 100% should be the target, but it actually isn’t.
Kevin Mackey:
I don’t believe it’s possible. I cannot say that it’s not possible, because there are people far cleverer than I out there. And people who actually write compilers for living know how to take … Effectively, know how to take code in one form and produce perfectly working code in another form. Just really what compilers do. So, there are people who are capable of this.
Kevin Mackey:
For the rest of us, then you need enough that it’s worth to invest the effort and time to write the transpiler in the first place, because you could sit down and start rewriting the code, and you will get immediate results. And some of them may actually be useful, which you will get immediate results.
Kevin Mackey:
If you decide to go the transpiler routes, then you won’t get immediate results. There’s this whole period where you have to figure out what the patterns in the language are if you don’t happen to have a grammar for it, or one available out on GitHub. Then after figuring out what the patterns of the output are supposed to look like. This is not a small task. A period of time will go by, and it’s not a day or two before you begin to get results. So, you have to aim for enough that will give you a decent return on that upfront investment where you are getting nothing at all, except frustration.
Federico Tomassetti:
Yes, I know. I think this is a very good point. And one should remember that we are not writing transpiler for fun or not only for fun. For sure, the client is not paying for the sake of …
Kevin Mackey:
Paying for fun.
Federico Tomassetti:
… reaching some theoretical goal or 100% translation at some point. It just makes sense to translate what will be very hard to translate automatically. Made sense to translate it manually that part that hopefully is small enough to be tractable by a smaller army if you had to translate.
Kevin Mackey:
Yeah. And if nothing else, by then you will have better patterns to explain to your smaller army, because you’ve had to build them into your transpiler.
Federico Tomassetti:
Okay. Now, just suppose then we will cut this. I wanted just to check with you this, because we have 15 minutes left, and at the end, I want to ask some questions about you, like which services you offer, where someone can find more resources about you. And so, we will use the last few minutes for this. But I think we will not be able to cover all the remaining questions. So, I wanted to check with you, which question you will prefer to focus on next. Maybe if it’s one question that give us an excuse to use the slides you have maybe.
Kevin Mackey:
Okay. Yes. Partly, I didn’t want this to be entirely theoretical. We did this, I did that. We produced hundreds of stored procedures and so on. “Very nice. Show us. What did you do? How was this done? Oh, yeah, Kevin said he wrote a transpiler or two. What does that look like?”
Federico Tomassetti:
Maybe I can ask you that. So, can you tell us more about this transpiler now it looks like? And then you can show the slide for that.
Kevin Mackey:
Okay. Certainly. Part of this migration project with a company that was translating the Java was doing the online work on screens and so on. However, this is a mainframe batch processing. They were converting the batch programs as well. However, again, we are faced with data.
Federico Tomassetti:
Okay. Sorry, if you started, maybe I should ask the question. I will ask the question after I do pose, so that we can … Yeah, sorry for that confusion.
Kevin Mackey:
Okay. Yeah.
Federico Tomassetti:
So, we agreed that, that is the good question for showing the slide, asking you about.
Kevin Mackey:
So, just to provide the background, the batch problems operated basically on extract files of data. The development environment that the company has, they call them FUNLOADs, which is a Fast/Unload Language. They had written this to sort of pull data out of different files, dump them into a fixed length format. Very like an RPG output spec. In column three for 12 column, 12 row, 12 spaces, in column 15, and so on.
Kevin Mackey:
The problem is they had hundreds of these Fast/Unload programs, which worked terribly well. But now, their data is in SQL environment. So, they needed the FUNLOAD script in SQL, and they wrote themselves a generic SQL executor effectively. “Okay, call this script, dump the file out, and then the transpiled, or the converted Java batch program will then go operate on it.” And they said, “Kevin, we have several hundred. It was 700 last time I checked of these scripts, can you and the rest of the people with whom you work convert these?” And I said, “You know what I’m going to do? I’m not going to write these by hand.” Let’s go take a look at what that might look like. Yeah.
Federico Tomassetti:
Good. We are seeing the slide and a test file.
Kevin Mackey:
Yeah. This is in fact what that Fast/Unload code looks like. So, a job, a step. This is the file to be opened. Some things to figure out what today is. Workout basically the last week, effectively. Yeah, seven days ago to now. And then quite procedural for each record within the file.
Kevin Mackey:
Check this column. If it’s this value, then go through this. Otherwise, skip the record. If these conditions are met, basically, if the date is within the two dates we calculated, and this set of conditions are true, and if this exists and this other column exists, which effectively translates into SQL is null. Not null. Then put which is basically the output value.
Kevin Mackey:
Put a column nine for string 11, the value in that column and this other one, because these were the ones we checked. If they actually exist, then dump the values out. End if, end if, end if, otherwise, skip, end select, end for. Basically, run through the code and run through the file. When these conditions are met, dump the values out. Please get and translate this to SQL. So, I have that code, which is in fact in a file called TestFunload.txt. And here, Kevin goes, hits enter and it all blows up, I don’t know.
Federico Tomassetti:
That seemed went well. This is reassuring.
Kevin Mackey:
Generating the SQL file, getting the column list. Because, of course, the SQL in the file itself on the mainframe is used as a separator for the pieces of the column name. In SQL, we changed it to underscores.
Kevin Mackey:
Okay. One of the columns we’re needing. So, we have the metadata for all of BUSAUDIT available somewhere. It’s the same metadata we use to generate the XML shredding, and persisting, and so on. So, it was fortunate that I was able to reuse the data. Where did I put SQL? Oh, yes. A blank page, but it was open to a particular file. And so, this has just been changed. Yes.
Federico Tomassetti:
Nice.
Kevin Mackey:
10:22, which I’m not sure what time it is, where you are. Seven hours?
Federico Tomassetti:
Yes, seven.
Kevin Mackey:
So, 22 minutes past 5:00 in the evening. 17:22. “Hey, Kevin is being honest, it was produced just now.” This is the name of the output file, because that’s the one they wanted, which is basically MCC is the name for the new application environment. The name of the job, the step, dot-SQL. The name of the job, the step, dot-SQL. And these are hopefully all the conditions correctly in place. Go figure out what the dates are, and then put it all out in one fixed position record.
Kevin Mackey:
And interestingly enough, because this starts at position nine, then don’t forget to dump out eight spaces first. Because literally, it has output record and it’s putting data in at certain position. But SQL has no concept of an output record, so pack this as much as you need to. And by the way, if they’re null, just make sure, and so on.
Kevin Mackey:
Yes, it took time to write this transpiler and you need to produce something that can recoup that. In testing, this translates, and that was the number or the set I had. 237 of these Fun/Load scripts can be converted in three minutes. There’s only one in here, but I could have made several and it will make one SQL file for each of these.
Kevin Mackey:
Basically, it’s an ANTLR parser that produces everything, and the job is the owner of everything below it, and therefore the transpiler iterates across the tree looking for job statements and say, “Okay, I have a job statement. You’ll do that. Now, move to the next job statement. Go do that, and so on.” To do that, I had to basically write the grammar for that.
Kevin Mackey:
The put statement and the put statement later is … So, job statement equals identifier and of the step. So, we pull those pieces of data out. Which file are we talking about? They may or may not use the keyword file. Some did, some did not. I discovered this later. An open file list, because they may open several. N, where N is any number they chose. So, N round.
Kevin Mackey:
And the continuation statement, which we see there that says, “Okay, this continues on the next.” And the continuation statement can be used in the middle of complex conditionals or input statements, or in open statements, or wherever they felt like doing it, and so on it went. The for statement. Yes, there are several varieties of which. The put statement.
Federico Tomassetti:
This is really impressive to see how the system looks in practice or what kind of work there is behind writing a transpiler. So, I think that is nice.
Kevin Mackey:
Yeah. And then, of course, this is the Fun/Load translate program that we ran. This is the other end of it. I used Terence Parr string. What did he call it, string template?
Federico Tomassetti:
Yeah. I will say string template.
Kevin Mackey:
I think it’s string template. He’s the writer of ANTLR, basically. There, we see the header at the top and the Fun/Load header, FL. Source file generated on, job header. “Oh, by the way, if this is true, then output them, because sometimes it was not. And then what is the output file?” Terence Parr says that if your template doesn’t look like a grammar, you’re doing it wrong.
Federico Tomassetti:
I think you will pass Terence.
Kevin Mackey:
Well, yeah. For something where I know the output, I’m always going to be creating these SQL files, or I’m always going to be creating this scripting language. Then I like string template. Once I figured out how to use it, it is marvelously flexible and powerful, and it works. Terence Parr uses to generate the parsers out of ANTLR, therefore, demonstrably it works. And once I was able to figure it out, it works well for me.
Kevin Mackey:
If I’m writing something a bit more flexible in terms of what it output, string template is not good for that. It knows how to do a thing, and it’s very good for that. So, for the generation of the SQL code, for shredding XML messages. And the XML messages look kind of like this. What’s the action? This is the information, the data. This is the record key and so on. They all look the same except this block in the middle is the different columns for that particular fields for that particular table.
Kevin Mackey:
The SQL to rip this apart, and persistent and so on is used. Uses the metadata that, in fact, the transpiler used. But we also have to generate the C# code that knows how to shred this XML message. In the end, we also generate one set of the Java code that gets merged into the rest of the project, because it is very much to do with what are the columns in this particular table and so on.
Kevin Mackey:
Because we are generating SQL, and C#, and Java, and as it turns out, Fun/Load code as well that we all have generated. We went with a Python program and using a templating language called pyratemp, partly because I was the only person in the group who knew string templates. And I was able to push some of the writing of the pyratemp templates off to, spread it around the team.
Kevin Mackey:
Producing just one type of output string template is extraordinarily good for that and fast, even if it is using ANTLR 3 apparently. But if from the same set of metadata, you have the requirement to build different things, impossibly different languages, then we went with a Python environment and pyratemp templates. Pyratemp, because it was relatively simple, straightforward. There are hundreds of templating capabilities out there. So, it was just a question of choosing which one to work with.
Federico Tomassetti:
Yeah. I think it’s been very instructive. And I would like to be sure that we have time to ask you also more about you. And in particular, can you tell us what kind of services you offer in your role as a consultant?
Kevin Mackey:
Yes. I’ve said we and I’ve said the team a couple of times. Data View is a loose collaboration between currently four of us. Alfredo Araiza actually owns Data View. Brendon, Bill, and myself, we actually have our own one-man band consulting companies. But Brendon and Alfredo have been working together for quite a number of years. Even when they were full-time employees, they worked together.
Kevin Mackey:
Brendon and I ran into one another at a consulting environment, in a consulting engagement, and then introduce me to Alfredo when they landed this modernization exercise, which we also pulled in Bill Fellows. Apparently, he’s no longer a Microsoft MVP, because COVID got in the way, and therefore he wasn’t doing the SQL Saturday. It was tougher to do community-based things when you couldn’t get a community together easily, but he’s also co-author of a thing called the BIML Book, which is Business Intelligence Markup Language, which is an XML grammar, which is used for generating Microsoft SSIS packages, among other things. In fact, it’s a full-scale scripting language and therefore you can generate. XML is just text and therefore all code is just text, and therefore can generate pretty much anything.
Kevin Mackey:
The word data is in the name, because data services, data migrations, data visualizations, data cleansing. As it turns out, even wound up involving cloud migration. Currently, there’s another client we’re working with who have wanted to go to the cloud for some time, largely because they’ve got almost no IT people, but they have a bunch of servers, and VMs, and so on. And they need to get out of the business of having this online. So, move their databases at least to start with them take some of their old code, some of which is in VB.NET. And again, it’s a decade old at this point. We’re working those. They’re an event-driven business that operate in a batch mentality, because, hey, that’s what was known. So, we are helping migrate their data to …
Kevin Mackey:
We work largely in the Microsoft world, although the three of us, these three, we also have Oracle and so on experience. In terms of cloud work, most of it has been in Azure, because it was there, and it played in the environment. So, we’re moving the data or have moved databases there. Taking some of the older application that runs businesses, change, turn them into functions that react in terms of events happening, Service Bus, queuing messages. Because all of us and I were fond of the concept of queues, because the infrastructure preserves the data till you can get to it.
Kevin Mackey:
And in terms of what we’re talking about, in terms of data and when we get to it, in terms of messaging from the system that we’re working with, on average, they’re coming in every millisecond. It’s an average and therefore, there are outliers. Some are more than others.
Kevin Mackey:
So, we do queuing because the underlying environment will help. So, you’ve got fast paced, near real-time stuff, and you need to move data. This is what we do. This is something we built in R, because they’d knew how to do small multiples. And these are the active queues during a particular five minutes. The red line is the three second mark.
Kevin Mackey:
Built into the code we generate is store when we finished the queuing, when we finished persisting. And then we can take this and turn that into information as to what exactly is happening. The denser the line is, the more observations. So, their quoting system is the one of the busiest. And while there are peaks at one particular millisecond, that came close to the three second. Most of the time, we’re well within that.
Kevin Mackey:
And we needed these kinds of … We turn data into information that can be used on for the business. And to, in fact, help diagnose what problems might actually occur. So, I think those are the services. It’s data centric, but it’s about migration, it’s about turning data into information in some way, and that’s what we do.
Kevin Mackey:
And me, I write code that writes code, because there’s too much code as you well know. There’s too much code to write, and there’s only four of us. We borrow other people as needed, but for some things, there’s just too much code to write. If you can make it a pattern, I can make it automated.
Federico Tomassetti:
I think this gives a very good idea of what you’re doing in Data View. I would like to thank you for sharing all of your experience and explaining. You did concretely this project and, I think, for me, it’s been very interesting. And so, thank you, Kevin, for your time.
Kevin Mackey:
Okay. I’m very happy to have this opportunity, because it’s a practical example of what many people think about. These things are doable. They’re not easy, but they are doable. If I could only figure out domain specific languages for some of this, I’d be even happier, but I have a lot of work to do in that area before I get any way useful.
Federico Tomassetti:
Good. Let’s think about that for the future. Thank you and goodbye.
Kevin Mackey:
Okay. Thank you. Thank you for your time. Bye.
Additional links:
A useful link is Our Team | Data View LLC (dataview-llc.com) which talks about the members of the Data View team.
This link: Code Generation | Data View LLC (dataview-llc.com) talks briefly, and shows a short video, of one area where we go code generation.