The Problem with Embeds, Part 2: Refactoring the TXA Parser

by Unknown user

In the first article in this series I said that most Clarion developers use embed points the wrong way, and by doing so they make their applications more difficult to maintain, test, debug and document. Almost every Clarion developer has done that; I've done it too. In these articles I intend to show how you can improve your code base by taking the majority of that code out of the embed points.

I'll be working through a couple of examples, including the Invoice app which ships with Clarion. But in this article I'll focus on the TXA embed parser example I introduced in Part 1.

In any analysis of embed code and how it might be better deployed, the first thing you need is a convenient way to look at just your embeds. That's not so easy because, embeds being embeds, they're sprinkled throughout your application.

Really you have two options: You can browse it in the embed list or in the embeditor, or you can use a tool to extract just the embeds so you can look at them without the distraction of the generated code.

About TXAs

As far as I know, there's only one way to programmatically extract the embed code from an APP. First, you have to export a TXA, which is an import/export text version of the information contained in an APP file. And second, you to parse the TXA to get the embeds, which can be a bit of a pain as the TXA format isn't documented.

A couple of years I wrote a TXA parser to do just that task, in support of an article series on the most popular embed points. So it seemed natural, when I needed a way to extract embed points for this series, to revisit that code.

Unfortunately, that code isn't very pretty. Most of it is contained in just a couple of embed points. In the TakeAccepted method's data section there are some declarations:

x              long
vars           group,pre()
procname         string(200)
procFromABC      string(60)
procCategory     string(60)
embedname        string(60)
embedPriority    long
embedParam1      string(200)
embedParam2      string(200)
embedParam3      string(200)
whenLevel        byte
               end
dumptrace      byte(0)
LastProcName  like(procname)
lastEmbedName string(500)
currembedname string(500)

And then a little later on in TakeAccepted, at an embed that's called when the user presses the Import button, the TXA gets parsed. That code loops through the records in a previously created queue of TXA files (txaq):

      ?progressvar{prop:rangehigh} = records(txaq)
      setcursor(cursor:wait)
      loop x = 1 to records(txaq)
         get(txaq,x)
         ?progressVar{prop:progress} = x-1
         clear(ema:record)
         EMA:TXA = txaq.name
         Access:EmbedApp.Insert()
         EmbedApp{prop:sql} = 'select last_insert_id()'
         next(EmbedApp)
         ! Add the queue header record
         access:TextFile.Close()
          GLO:TextFileName = txaq.name
         access:TextFile.Open()
         Access:TextFile.UseFile()
         set(TextFile)
         ProcName = ''
         state = 0
         lineNo = 0
         clear(procname)
         clear(lastprocname)
         clear(lastembedname)
         clear(currembedname)
         LOOP
            next(TextFile)
            if errorcode() then break.
            dumptrace = false
            lineNo += 1
            CASE state
            OF 0 ! search for the start of a module or procedure, or an embed
               if sub(txt:rec,1,11) = '[PROCEDURE]'
                  clear(vars)
                  state = 10
               elsif sub(txt:rec,1,8) = '[MODULE]'
                  clear(vars)
                  procName = '[MODULE]'
               elsif sub(txt:rec,1,7) = 'EMBED %'
                  embedName = sub(txt:rec,7,len(txt:rec))
                  state = 30
               elsif sub(txt:rec,1,8) = '[SOURCE]'
                  state = 50
               end
            OF 10 ! get procedure name details
               if sub(txt:rec,1,4) = 'NAME'
                 procName = sub(txt:rec,6,len(txt:rec))
                 state = 11
               end
               do CheckForMissedEmbed
            OF 11
               if sub(txt:rec,1,8) = 'FROM ABC'
                 procFromABC = sub(txt:rec,10,len(txt:rec))
                 state = 12
               end
               do CheckForMissedEmbed
            OF 12
               if sub(txt:rec,1,8) = 'CATEGORY'
                 procCategory = sub(txt:rec,11,len(clip(txt:rec))-11)
               end
               state = 0
               do CheckForMissedEmbed
            of 30 ! Look for a first embed parameter 
               if sub(txt:rec,1,11) = '[INSTANCES]'
                  state = 41
               elsif sub(txt:rec,1,8) = '[SOURCE]'
                  state = 50
               end
            of 41 ! Get first parameter
               if sub(txt:rec,1,6) = 'WHEN '''
                  embedParam1 = sub(txt:rec,7,len(clip(txt:rec))-7)
                  WhenLevel = 1
                  !db.out('whenlevel=' & whenlevel)
               end
               state = 42
               do CheckForMissedEmbed
            of 42 ! Look for a second embed parameter
               if sub(txt:rec,1,11) = '[INSTANCES]'
                  state = 43
               elsif sub(txt:rec,1,8) = '[SOURCE]'
                  state = 50
               end
            of 43 ! Get second parameter
               if sub(txt:rec,1,6) = 'WHEN '''
                  embedParam2 = sub(txt:rec,7,len(clip(txt:rec))-7)
                  WhenLevel = 2
               end
               state = 44
               do CheckForMissedEmbed
            of 44 ! Look for a third embed parameter
               if sub(txt:rec,1,11) = '[INSTANCES]'
                  state = 45
               elsif sub(txt:rec,1,8) = '[SOURCE]'
                  state = 50
                  !db.out('found PRIORITY')
               end
            of 45 ! Get third parameter
               if sub(txt:rec,1,6) = 'WHEN '''
                  embedParam3 = sub(txt:rec,7,len(clip(txt:rec))-7)
                  WhenLevel = 3
                  !db.out('whenlevel=' & whenlevel)
               end
               state = 50
               do CheckForMissedEmbed
            of 50  ! look for the priority
               if sub(txt:rec,1,8) = 'PRIORITY'
                  embedPriority = sub(txt:rec,10,len(txt:rec))
                  if lastprocname <> procname
                     ! insert new EmbedProc record
                     clear(EMP:record)
                     EMP:Proc = procname
                     EMP:ProcFromABC = ProcFromABC
                     EMP:ProcCategory = ProcCategory
                     EMP:EmbedAppID = EMA:EmbedAppID
                     Access:EmbedProc.Insert()
                     EmbedProc{prop:sql} = 'select last_insert_id()'
                     next(EmbedProc)
                     lastprocname = procname
                  end
                  ! Add the embed record
                  currEmbedName = clip(embedName) & clip(embedparam1) |
                     & clip(embedparam2) & clip(embedparam3) & embedpriority
                  if currEmbedName <> lastEmbedName
                     lastEmbedName = currEmbedName
                     EMB:EmbedProcID = EMP:EmbedProcID
                     EMB:Embed = EmbedName
                     EMB:Param1 = EmbedParam1
                     EMB:Param2 = EmbedParam2
                     EMB:Param3 = EmbedParam3
                     EMB:Priority = embedpriority
                     access:Embed.Insert()
                  end
                  state = 51
                end
            of 51
               state = 60
               do CheckForMissedEmbed
            OF 60  ! capturing embed
               ! Quit when [END] encountered
               if sub(txt:rec,1,1) = '['
                  if sub(txt:rec,1,5) = '[END]'
                     case WhenLevel
                     of 3
                        WhenLevel = 2
                        embedParam3 = ''
                     of 2
                        WhenLevel = 1
                        embedParam2 = ''
                     of 1
                        WhenLevel = 0
                        embedParam1 = ''
                     end
                     state = 0

                  elsif sub(txt:rec,1,8) = '[SOURCE]'
                     ! look for another embed under this [EMBED] point
                    state = 50
                  else
                     ! could be we're done
                     state = 0
                  end
              elsif sub(txt:rec,1,6) = 'WHEN '''
                  case WhenLevel
                  of 0
                     ! get the first param
                     embedParam1 = sub(txt:rec,7,len(clip(txt:rec))-7)
                     WhenLevel = 1
                     state = 42
                  of 1
                     ! get the second param
                     embedParam2 = sub(txt:rec,7,len(clip(txt:rec))-7)
                     WhenLevel = 2
                     state = 50
                  of 2
                     state = 50
                  end
               else
                  ! write embed buffer 
               end
            else
               do CheckForMissedEmbed
            END
         END
         Access:TextFile.Close()
      end
      ?progressVar{prop:progress} = records(txaq)
      setcursor()

In Embeds-original.zip have a look at the ImportTXAs procedure in Embeds.app for the complete source.

What I had in mind for my new utility was something more along these lines:

My original parser's functionality was overkill for this new app; I really didn't need to build up an elaborate database of applications, procedures, and embed points. I just needed a list of embed points. But obviously I needed all of the parsing capabilities, gnarly though the code might be.

Unfortunately, there wasn't any way to use my original code unchanged, primarily because that code didn't actually extract the embedded source from the TXA - there was no need to capture the actual embed code since I was just logging embed usage. And I was motivated not to store the embed code: I had asked for TXA submissions which could contain sensitive information, and I didn't want to accidentally expose anyone's embed code to public view.

So what were my options? A few came to mind:

Just cut and paste the source. This is what a lot of us do, and it has the obvious drawbacks of creating multiple versions of the code to maintain. If I find a bug in my parsing code, I'll need to hunt down every place I've pasted a copy and make the change there.
Put my original procedure in a DLL and call it as needed. In this case my DLL also presents a user interface. That would result in some UI clunkiness in the app I envisioned in Figure 1, which doesn't need to call yet another window just to do the parsing.
Put the common source in source files and INCLUDE them. I could even include just portions of the files using labels. The drawback here is that there's no way to know how the various sections of source code might be used, or how any bug fixes to that source might cause unexpected bugs. Using INCLUDE statements this way results in an almost complete loss of control over the source code.
Create a template containing the source code. This certainly helps keep the code in one place, but it has a lot of negatives when it comes to maintaining and testing that code since you have to put the template in an app and generate the code, and then you have to port any changes back to the template.

And there were other problems. Because my original app was tied to a particular data store (a PostgreSQL database), any re-use of that code would have to know the table definitions. Since Clarion only supports one dictionary per app, any apps that used this procedure would either need to use the dictionary or import the tables from that dictionary.

Class or procedure?

So what's the answer? If it's not a template, and not a multi-purpose generated procedure and not INCLUDEd source, what's left? Procedures and classes, that's what. But not just any procedures or classes. I wanted to write code that had as few dependencies as possible.

Some of the dependencies I wanted to avoid:

Files/tables - not tie my code to a specific database
Windows/controls - not tie my code to a specific user interface element
Other code - keep calls to other procedures/classes to a minimum

So when should you use a class, and when a procedure?

In almost all cases a class is preferable to a procedure, in the same way that a procedure is almost always preferable to spaghetti code. A procedure presents a single point of entry and a single result. That's not to say you can't return multiple values from a procedure - you clearly can, as Steve Parker h as showed. But procedures don't have the flexibility of classes.

In fact, a class method is really just a procedure, so using a class already gets you everything a procedure can do. But it also gets you more, because in a class multiple methods can operate on shared data.

The test app

In recent years I've become a devotee of test-driven development, which embodies the idea that you write your test first and then start in on the code needed to pass the test. I find this provides a lot of clarity to my class design.

So my first question is, what should my test(s) look like?

I'll need a parser object, of course. And this is going to be part of the DevRoadmaps Clarion Library, so it needs to follow the DCL naming conventions. That means that DCL_ is the first part of the name (Clarion doesn't support namespaces in class names, so I fake these by separating name parts with underscores).

The second part of the name? Maybe Clarion, since the parser is a class that relates specifically to Clarion's own functionality. If, for instance, I wrote some code to manipulate solution or project files I'd also begin those classes with DCL_Clarion_. If I'm not expecting a lot of classes then two levels of naming is sufficient, although sometimes I'll go to three.

Provisionally, I'll name my class DCL_Clarion_TXAParser. For more on creating test apps see Creating a ClarionTest test DLL (DCL).

The first test

So what should my first test look like? A reasonable place to start seems to be identifying the number of embed points in a TXA, using code like this:

txa       DCL_Clarion_TXAParser
 
  code
  txa.Parse('test.txa')
  AssertThat(txa.GetEmbedCount(),IsEqualTo(6),'Wrong number of embeds found in TXA')

I'm not creating this code from scratch; instead I'm refactoring a previously created set of classes to use the DCL standards and existing DCL classes. But the original code doesn't have a GetEmbedCount method so I'll have to add that at some point.

The class

The first decision I have to make is how to store the embed data. Originally I used a SQL database, but this now appears to be a liability. My parser should use something more transient that can be converted to a permanent store or just discarded after use. An in-memory data store, in the form of nested queues, fits the bill:

DCL_Clarion_TXAParser_ProcedureQueue    queue,TYPE
ProcedureName                               string(40)
EmbedQ                                      &DCL_Clarion_TXAParser_EmbedQueue
										END
DCL_Clarion_TXAParser_EmbedQueue        queue,type
EmbedName                                   string(100)
EmbedLineQ                                 &DCL_Clarion_TXAParser_EmbedLineQueue                        
										END
DCL_Clarion_TXAParser_EmbedLinesQueue   queue,TYPE
Line                                        cstring(1000)
										END

Since these queues are used by DCL_Clarion_TXAParser I've included them in DCL_Clarion_TXAParser.inc and have given them a prefix of DCL_Clarion_TXAParser_.

The parser's job will be to populate these queues so that I end up with a DCL_Clarion_TXAParser_ProcedureQueue containing one or more records. Each of those procedure records has one or more DCL_Clarion_TXAParser_EmbedQueue records, each of which has one or more DCL_Clarion_TXAParser_EmbedLineQueue records for each line in a given embed point. With that queue structure in hand I can easily update a database or create a text file, as I see fit.

At first blush this looks like an ideal task for a procedure. Pass in the name of the TXA and an empty queue and get back a filled queue: presto! But here's why I think the procedural solution is hardly ever a good solution: there's almost always some new functionality you will want to add which doesn't fit into the existing procedure.

Think about error handling. You can have the parsing procedure return an errorcode if the parse fails for any reason, but what if you want to get a more detailed error response? What if you want to enable tracing or logging? How would you do that in a single procedure call, without burdening the procedure with some obscenely large number of parameters?

In fact, once I began rewriting my parsing code as a class I ended up with rather a lot of methods and a few properties as well:

DCL_Clarion_TXAParser                   Class,Type,Module('DCL_Clarion_TXAParser.CLW'),Link('DCL_Clarion_TXAParser.CLW',_DCL_Classes_LinkMode_),Dll(_DCL_Classes_DllMode_)
ProcedureQ                                  &DCL_Clarion_TXAParser_ProcedureQueue
ProcedureQIsExternal                        bool
Errors                                      &DCL_System_ErrorManager
Construct                                   Procedure()
Destruct                                    Procedure()
AddNewEmbed                                 procedure(string embed,string embedparam1,string embedparam2,string embedparam3,string embedpriority),private
AddNewProcedure                             procedure(string pname),private
CheckForMissedEmbed                         procedure(string s,long lineno,long state),private
GetEmbedCount                               procedure,long
GetProcedureCount                           procedure,long
Parse                                       PROCEDURE(string filename),bool,proc
RemoveCurrentProcedureFromQueue             procedure,private
Reset                                       PROCEDURE
SetQueue                                    procedure(DCL_Clarion_TXAParser_ProcedureQueue procedureQ)
										End

I won't go into all of these methods in detail - you can have a look at the source if you're interested. Briefly, there are a few private methods to break up the internal code into reusable blocks (you'd use routines in a procedural implementation), and there are some public methods to get the procedure and embed counts, parse the specified TXA, reset the parser and specify the particular queue to use.

Now, show me how you'd implement all that in a procedure!

Note that use of the SetQueue method. By default the class will use its own internal DCL_Clarion_TXAParser_ProcedureQueue instance, but if I wish I can tell the class to use an external queue so I can more easily process the data, e.g. to write it to a database or a text file.

Next time: testing the class!