The Problem with Embeds, Part 2: Refactoring the TXA Parser
by Unknown user
In the first article in this series I said that most Clarion developers use embed points the wrong way, and by doing so they make their applications more difficult to maintain, test, debug and document. Almost every Clarion developer has done that; I've done it too. In these articles I intend to show how you can improve your code base by taking the majority of that code out of the embed points.
I'll be working through a couple of examples, including the Invoice app which ships with Clarion. But in this article I'll focus on the TXA embed parser example I introduced in Part 1.
In any analysis of embed code and how it might be better deployed, the first thing you need is a convenient way to look at just your embeds. That's not so easy because, embeds being embeds, they're sprinkled throughout your application.
Really you have two options: You can browse it in the embed list or in the embeditor, or you can use a tool to extract just the embeds so you can look at them without the distraction of the generated code.
About TXAs
As far as I know, there's only one way to programmatically extract the embed code from an APP. First, you have to export a TXA, which is an import/export text version of the information contained in an APP file. And second, you to parse the TXA to get the embeds, which can be a bit of a pain as the TXA format isn't documented.
A couple of years I wrote a TXA parser to do just that task, in support of an article series on the most popular embed points. So it seemed natural, when I needed a way to extract embed points for this series, to revisit that code.
Unfortunately, that code isn't very pretty. Most of it is contained in just a couple of embed points. In the TakeAccepted method's data section there are some declarations:
x long vars group,pre() procname string(200) procFromABC string(60) procCategory string(60) embedname string(60) embedPriority long embedParam1 string(200) embedParam2 string(200) embedParam3 string(200) whenLevel byte end dumptrace byte(0) LastProcName like(procname) lastEmbedName string(500) currembedname string(500)
And then a little later on in TakeAccepted
, at an embed that's called when the user presses the Import button, the TXA gets parsed. That code loops through the records in a previously created queue of TXA files (txaq
):
?progressvar{prop:rangehigh} = records(txaq) setcursor(cursor:wait) loop x = 1 to records(txaq) get(txaq,x) ?progressVar{prop:progress} = x-1 clear(ema:record) EMA:TXA = txaq.name Access:EmbedApp.Insert() EmbedApp{prop:sql} = 'select last_insert_id()' next(EmbedApp) ! Add the queue header record access:TextFile.Close() GLO:TextFileName = txaq.name access:TextFile.Open() Access:TextFile.UseFile() set(TextFile) ProcName = '' state = 0 lineNo = 0 clear(procname) clear(lastprocname) clear(lastembedname) clear(currembedname) LOOP next(TextFile) if errorcode() then break. dumptrace = false lineNo += 1 CASE state OF 0 ! search for the start of a module or procedure, or an embed if sub(txt:rec,1,11) = '[PROCEDURE]' clear(vars) state = 10 elsif sub(txt:rec,1,8) = '[MODULE]' clear(vars) procName = '[MODULE]' elsif sub(txt:rec,1,7) = 'EMBED %' embedName = sub(txt:rec,7,len(txt:rec)) state = 30 elsif sub(txt:rec,1,8) = '[SOURCE]' state = 50 end OF 10 ! get procedure name details if sub(txt:rec,1,4) = 'NAME' procName = sub(txt:rec,6,len(txt:rec)) state = 11 end do CheckForMissedEmbed OF 11 if sub(txt:rec,1,8) = 'FROM ABC' procFromABC = sub(txt:rec,10,len(txt:rec)) state = 12 end do CheckForMissedEmbed OF 12 if sub(txt:rec,1,8) = 'CATEGORY' procCategory = sub(txt:rec,11,len(clip(txt:rec))-11) end state = 0 do CheckForMissedEmbed of 30 ! Look for a first embed parameter if sub(txt:rec,1,11) = '[INSTANCES]' state = 41 elsif sub(txt:rec,1,8) = '[SOURCE]' state = 50 end of 41 ! Get first parameter if sub(txt:rec,1,6) = 'WHEN ''' embedParam1 = sub(txt:rec,7,len(clip(txt:rec))-7) WhenLevel = 1 !db.out('whenlevel=' & whenlevel) end state = 42 do CheckForMissedEmbed of 42 ! Look for a second embed parameter if sub(txt:rec,1,11) = '[INSTANCES]' state = 43 elsif sub(txt:rec,1,8) = '[SOURCE]' state = 50 end of 43 ! Get second parameter if sub(txt:rec,1,6) = 'WHEN ''' embedParam2 = sub(txt:rec,7,len(clip(txt:rec))-7) WhenLevel = 2 end state = 44 do CheckForMissedEmbed of 44 ! Look for a third embed parameter if sub(txt:rec,1,11) = '[INSTANCES]' state = 45 elsif sub(txt:rec,1,8) = '[SOURCE]' state = 50 !db.out('found PRIORITY') end of 45 ! Get third parameter if sub(txt:rec,1,6) = 'WHEN ''' embedParam3 = sub(txt:rec,7,len(clip(txt:rec))-7) WhenLevel = 3 !db.out('whenlevel=' & whenlevel) end state = 50 do CheckForMissedEmbed of 50 ! look for the priority if sub(txt:rec,1,8) = 'PRIORITY' embedPriority = sub(txt:rec,10,len(txt:rec)) if lastprocname <> procname ! insert new EmbedProc record clear(EMP:record) EMP:Proc = procname EMP:ProcFromABC = ProcFromABC EMP:ProcCategory = ProcCategory EMP:EmbedAppID = EMA:EmbedAppID Access:EmbedProc.Insert() EmbedProc{prop:sql} = 'select last_insert_id()' next(EmbedProc) lastprocname = procname end ! Add the embed record currEmbedName = clip(embedName) & clip(embedparam1) | & clip(embedparam2) & clip(embedparam3) & embedpriority if currEmbedName <> lastEmbedName lastEmbedName = currEmbedName EMB:EmbedProcID = EMP:EmbedProcID EMB:Embed = EmbedName EMB:Param1 = EmbedParam1 EMB:Param2 = EmbedParam2 EMB:Param3 = EmbedParam3 EMB:Priority = embedpriority access:Embed.Insert() end state = 51 end of 51 state = 60 do CheckForMissedEmbed OF 60 ! capturing embed ! Quit when [END] encountered if sub(txt:rec,1,1) = '[' if sub(txt:rec,1,5) = '[END]' case WhenLevel of 3 WhenLevel = 2 embedParam3 = '' of 2 WhenLevel = 1 embedParam2 = '' of 1 WhenLevel = 0 embedParam1 = '' end state = 0 elsif sub(txt:rec,1,8) = '[SOURCE]' ! look for another embed under this [EMBED] point state = 50 else ! could be we're done state = 0 end elsif sub(txt:rec,1,6) = 'WHEN ''' case WhenLevel of 0 ! get the first param embedParam1 = sub(txt:rec,7,len(clip(txt:rec))-7) WhenLevel = 1 state = 42 of 1 ! get the second param embedParam2 = sub(txt:rec,7,len(clip(txt:rec))-7) WhenLevel = 2 state = 50 of 2 state = 50 end else ! write embed buffer end else do CheckForMissedEmbed END END Access:TextFile.Close() end ?progressVar{prop:progress} = records(txaq) setcursor()
In Embeds-original.zip have a look at the ImportTXAs procedure in Embeds.app for the complete source.
What I had in mind for my new utility was something more along these lines:
My original parser's functionality was overkill for this new app; I really didn't need to build up an elaborate database of applications, procedures, and embed points. I just needed a list of embed points. But obviously I needed all of the parsing capabilities, gnarly though the code might be.
Unfortunately, there wasn't any way to use my original code unchanged, primarily because that code didn't actually extract the embedded source from the TXA - there was no need to capture the actual embed code since I was just logging embed usage. And I was motivated not to store the embed code: I had asked for TXA submissions which could contain sensitive information, and I didn't want to accidentally expose anyone's embed code to public view.
So what were my options? A few came to mind:
- Just cut and paste the source. This is what a lot of us do, and it has the obvious drawbacks of creating multiple versions of the code to maintain. If I find a bug in my parsing code, I'll need to hunt down every place I've pasted a copy and make the change there.
- Put my original procedure in a DLL and call it as needed. In this case my DLL also presents a user interface. That would result in some UI clunkiness in the app I envisioned in Figure 1, which doesn't need to call yet another window just to do the parsing.
- Put the common source in source files and INCLUDE them. I could even include just portions of the files using labels. The drawback here is that there's no way to know how the various sections of source code might be used, or how any bug fixes to that source might cause unexpected bugs. Using INCLUDE statements this way results in an almost complete loss of control over the source code.
- Create a template containing the source code. This certainly helps keep the code in one place, but it has a lot of negatives when it comes to maintaining and testing that code since you have to put the template in an app and generate the code, and then you have to port any changes back to the template.
And there were other problems. Because my original app was tied to a particular data store (a PostgreSQL database), any re-use of that code would have to know the table definitions. Since Clarion only supports one dictionary per app, any apps that used this procedure would either need to use the dictionary or import the tables from that dictionary.
Class or procedure?
So what's the answer? If it's not a template, and not a multi-purpose generated procedure and not INCLUDEd source, what's left? Procedures and classes, that's what. But not just any procedures or classes. I wanted to write code that had as few dependencies as possible.
Some of the dependencies I wanted to avoid:
- Files/tables - not tie my code to a specific database
- Windows/controls - not tie my code to a specific user interface element
- Other code - keep calls to other procedures/classes to a minimum
So when should you use a class, and when a procedure?
In almost all cases a class is preferable to a procedure, in the same way that a procedure is almost always preferable to spaghetti code. A procedure presents a single point of entry and a single result. That's not to say you can't return multiple values from a procedure - you clearly can, as Steve Parker has showed. But procedures don't have the flexibility of classes.
In fact, a class method is really just a procedure, so using a class already gets you everything a procedure can do. But it also gets you more, because in a class multiple methods can operate on shared data.
The test app
In recent years I've become a devotee of test-driven development, which embodies the idea that you write your test first and then start in on the code needed to pass the test. I find this provides a lot of clarity to my class design.
So my first question is, what should my test(s) look like?
I'll need a parser object, of course. And this is going to be part of the DevRoadmaps Clarion Library, so it needs to follow the DCL naming conventions. That means that DCL_ is the first part of the name (Clarion doesn't support namespaces in class names, so I fake these by separating name parts with underscores).
The second part of the name? Maybe Clarion, since the parser is a class that relates specifically to Clarion's own functionality. If, for instance, I wrote some code to manipulate solution or project files I'd also begin those classes with DCL_Clarion_. If I'm not expecting a lot of classes then two levels of naming is sufficient, although sometimes I'll go to three.
Provisionally, I'll name my class DCL_Clarion_TXAParser. For more on creating test apps see Creating a ClarionTest test DLL (DCL).
The first test
So what should my first test look like? A reasonable place to start seems to be identifying the number of embed points in a TXA, using code like this:
txa DCL_Clarion_TXAParser code txa.Parse('test.txa') AssertThat(txa.GetEmbedCount(),IsEqualTo(6),'Wrong number of embeds found in TXA')
I'm not creating this code from scratch; instead I'm refactoring a previously created set of classes to use the DCL standards and existing DCL classes. But the original code doesn't have a GetEmbedCount method so I'll have to add that at some point.
The class
The first decision I have to make is how to store the embed data. Originally I used a SQL database, but this now appears to be a liability. My parser should use something more transient that can be converted to a permanent store or just discarded after use. An in-memory data store, in the form of nested queues, fits the bill:
DCL_Clarion_TXAParser_ProcedureQueue queue,TYPE ProcedureName string(40) EmbedQ &DCL_Clarion_TXAParser_EmbedQueue END DCL_Clarion_TXAParser_EmbedQueue queue,type EmbedName string(100) EmbedLineQ &DCL_Clarion_TXAParser_EmbedLineQueue END DCL_Clarion_TXAParser_EmbedLinesQueue queue,TYPE Line cstring(1000) END
Since these queues are used by DCL_Clarion_TXAParser I've included them in DCL_Clarion_TXAParser.inc and have given them a prefix of DCL_Clarion_TXAParser_.
The parser's job will be to populate these queues so that I end up with a DCL_Clarion_TXAParser_ProcedureQueue containing one or more records. Each of those procedure records has one or more DCL_Clarion_TXAParser_EmbedQueue records, each of which has one or more DCL_Clarion_TXAParser_EmbedLineQueue records for each line in a given embed point. With that queue structure in hand I can easily update a database or create a text file, as I see fit.
At first blush this looks like an ideal task for a procedure. Pass in the name of the TXA and an empty queue and get back a filled queue: presto! But here's why I think the procedural solution is hardly ever a good solution: there's almost always some new functionality you will want to add which doesn't fit into the existing procedure.
Think about error handling. You can have the parsing procedure return an errorcode if the parse fails for any reason, but what if you want to get a more detailed error response? What if you want to enable tracing or logging? How would you do that in a single procedure call, without burdening the procedure with some obscenely large number of parameters?
In fact, once I began rewriting my parsing code as a class I ended up with rather a lot of methods and a few properties as well:
DCL_Clarion_TXAParser Class,Type,Module('DCL_Clarion_TXAParser.CLW'),Link('DCL_Clarion_TXAParser.CLW',_DCL_Classes_LinkMode_),Dll(_DCL_Classes_DllMode_) ProcedureQ &DCL_Clarion_TXAParser_ProcedureQueue ProcedureQIsExternal bool Errors &DCL_System_ErrorManager Construct Procedure() Destruct Procedure() AddNewEmbed procedure(string embed,string embedparam1,string embedparam2,string embedparam3,string embedpriority),private AddNewProcedure procedure(string pname),private CheckForMissedEmbed procedure(string s,long lineno,long state),private GetEmbedCount procedure,long GetProcedureCount procedure,long Parse PROCEDURE(string filename),bool,proc RemoveCurrentProcedureFromQueue procedure,private Reset PROCEDURE SetQueue procedure(DCL_Clarion_TXAParser_ProcedureQueue procedureQ) End
I won't go into all of these methods in detail - you can have a look at the source if you're interested. Briefly, there are a few private methods to break up the internal code into reusable blocks (you'd use routines in a procedural implementation), and there are some public methods to get the procedure and embed counts, parse the specified TXA, reset the parser and specify the particular queue to use.
Now, show me how you'd implement all that in a procedure!
Note that use of the SetQueue method. By default the class will use its own internal DCL_Clarion_TXAParser_ProcedureQueue instance, but if I wish I can tell the class to use an external queue so I can more easily process the data, e.g. to write it to a database or a text file.