/
Auto-Numbering in SQL - An Overview

Auto-Numbering in SQL - An Overview

By Mike Hanson

As developers of database applications, auto-numbering is something that we need to accomplish with virtually every table that we create.  (At least we should be doing that.)  There are two main purposes for an auto-numbered field:

  • As a primary key field, used for maintaining referential integrity.  The primary key field identifies a single record within table, now and into the future.  Your users should never be able to see or change this number.  When storing your data in a SQL database, primary keys are critically important, as Clarion’s POSITION function depends on them.
  • As a meaningful number (e.g. invoice number).  These are visible to the user, and represent the human identifiable items.  The user could theoretically decide to change the way these are assigned and formatted, so you don’t want to use these as your primary key.

How Does Clarion Do It?

In the very early days of Clarion, auto-numbering had to be done manually.  It wasn’t too long, though, before Clarion gave us a very handy method to achieve auto-numbering directly in the options for the key.  As long as the key was marked as unique, it could also be auto-numbered.  Not only that, it automatically handled multi-component keys, so you could auto-number detail rows associated an invoice header.

The main weakness with this approach is that it creates a placeholder record to reserve the new ID when you first express your intention to add that new item (e.g. when you first enter a form with Request=Insert).  Assuming you press OK when you’re done, that placeholder record is updated with the final values for the rest of the fields.  If you cancel from the form, then the placeholder is deleted, risking a chance that another user added a record after it, so that your cancelled number is never be used.  (We all know how much auditors love gaps in the numbering.)

Another downside is that initially blank values in the rest of the record’s fields could cause an error, if the blank fields are included in other unique keys.  Working around this involves priming those fields with some value that prevents the duplicate key error.

Yet another problem is that those placeholder records are sitting in the database with mostly blank fields, which could be seen by other users in browses, reports and processes.  Tweaking all of those procedures to prevent the appearance of these placeholders is an arduous task.  And if the user’s system dies mid-insert, these remnants are left mucking up the database.

So why would Clarion do it this way, rather than determining the number when the record insert is complete?  The main reason is that Clarion users expect it to work that way:

  • Some developers expose the primary key as the human readable number, displayed on the form.  Some developers even make it editable, so the user can manually jump to a higher number.
  • Creating something like an invoice entry system with header+child, the header form often contains a browse control to enter the detail items.  Clarion’s regular browse/form updates records in the detail database immediately, and those items need to know with which header number they’re associated.

To move away from using the placeholder record, we must decide how to handle the two points above.

  • For those who want human readable numbers, there are other ways around that (see “sequences” below).
  • For header+child scenarios, the best solution is to handle it yourself with a “shell game”.  Make use of a temporary store for the detail items, during data entry.  The easiest approach is to use an in-memory database (a.k.a. IMDD) or a separate table in your real database.  It’s easy to change the existing browses and forms to point to this different table.  Then you write a small bit of code to shuttle the detail records between this temporary store and your regular database table.  You assign the row number yourself when you add children.  The consequent benefit is that partial edits don’t muck up your database, requiring that you clean them up when a user cancels.

    If you desperately don’t want to change your code, then there’s an EMULATEAUTONUMKEY option.  It tells Clarion to use SQL for the auto-numbering,  but still creates a placeholder when you start a form in Insert mode.  I feel this is a bad design decision, so I’m not going to cover it in any depth.  If you want to risk its use, then it’s covered in the help.

Auto-Numbering Alternative: UUID

It’s worth mentioning that for primary keys, you don’t strictly need an auto-numbered field.  It must merely be unique.  You can use a “UUID” (Universally Unique ID), also known as a GUID (Globally Unique ID).  The terms are often interchanged, but strictly speaking a GUID is considered to be a Microsoft variant of a UUID.  You could use a Microsoft GUID for your primary key field, or simply use a STRING field with random characters and numbers (i.e. Base 36).  Ensure that the field is long enough for the value to be unique (e.g. 24 characters is roughly equivalent to a standard GUID), and assign the random value when the field is primed.

Pros:

  • It eliminates all the complexity of implementing auto-numbered primary keys.
  • You cannot predict the value of primary keys (as they are not sequential), which adds a slight bit of security to your system.
  • It makes replication easier to implement, as you already have a “universally” unique ID.  If you don’t intend to support replication, then this may seem unimportant to you, but it doesn’t hurt to have it there from the start.

Cons:

  • SQL engines often default to using the primary key as the “Clustered Index”, which means it physically stores the records in that order.  If you’re assigning random values, then the server will be reworking the file regularly to get those records in order.  Therefore, you must explicitly tell your database to use a different key as the clustered key. (By the way, I've read that choosing the best key for clustering is one of the best techniques for efficient queries, and supposedly the primary key isn't usually the best key, so this is something you should ponder regardless.)
    • According to Bruce Johnson at Capesoft, the speed impact is small, but I’ve spoken with administrators working on large databases with many new transactions each day, and they were uneasy about using this approach.
  • Long strings take up more space than compact integers, so your database will be bigger (and consequently slower).
  • On SQL Server the primary clustered key is included for all other index entries, so if you do use a GUID with a clustered primary key (the default) then you're potentially using a lot more data storage.
  • When debugging, working with random strings isn’t as easy as sequential numbers.

Do I Really Have to Change?

The short answer is “No.”

You may be thinking that this is sounding like a lot of trouble for little benefit.  If you don’t mind all of the issues with client-side auto-numbering using placeholder records (which you’ve always lived with in your ISAM system), then you can continue to let Clarion do things as it did before (even with SQL).  It will still work, with all of the aforementioned caveats.

However, now that you’ve decided to move to SQL, you may as well try to do things the right way.  SQL provides many opportunities to increase performance, improve stability, work with non-Clarion software, etc.  There are certain expectations that should be met when you do that.

In the end, though, it’s your decision.  If you’re still interested in learning more, then please read-on.

On with the Show!

Now that you understand the pros and cons of Clarion’s auto-numbering, let’s look at how SQL fits into the picture.  Unfortunately, it’s not as easy as flicking a switch in the key definition in your dictionary.  Clarion’s database drivers do a great job of making most database access generic, but auto-numbering is done differently with each engine (and sometimes different versions of SQL flavors change how they do it with new versions).  That’s why you must approach auto-numbering as a “custom” task.

In this first article I’ll list the key areas that you must address, followed by a series of articles with the specific steps using a variety of database engines (PostgreSQL, Microsoft SQL Server, SQLite…).  You can read this initial overview, then jump to the corresponding article for your engine of choice.  I doubt that I’ll cover all possible back ends, but hopefully I'll visit enough to help the majority of Clarion users.

I will also touch upon various points when using tools like DMC and File Manager 3, both of which I use.

Here are the main areas that will require our attention:

  • Creating your Table in SQL – When working with ISAM files, you probably use the Clarion CREATE command to create the file on disk.  You don’t want to do this with SQL, as it doesn’t communicate all of the attributes that you’ll want in your SQL table.  Instead, you should use a CREATE TABLE script.  You’ll usually need additional commands to start and/or finish the task.  You could write these by hand, or make use of DMC or FM3 to do this.  I’ll be providing sample scripts with the engine specific articles, so you can fully understand how things work together.
  • Types of Auto Numbering – Depending on your backend, you may use “sequences” or “identities” (or something by yet another name) to generate the unique IDs.  Some engines support both, while others can do only one or the other. 
    • Sequence – Think of this as a counting service.  You create a sequence resource, then define the table so it uses that resource for its auto-number.  These tend to provide more flexibility and control, but are trickier to setup.  They can also be used for those “meaningful numbers” that the users want to see, like invoice IDs.  (If your backend doesn’t support sequences and you still want to use them, you can roll your own with a manual counter field and a few stored procedures.)
    • Identity – This is essentially an auto-numbered field.  It’s usually done without a separate “sequence” facility, and may not provide as much control and flexibility as you may desire.  I’m a control freak, so I generally I prefer to use sequences when the backend supports them.
  • Primary Field – In most cases your auto-numbered Primary key field will be defined as a LONG, but there are additional settings required (especially user options), which vary depending on the backend, whether you’re using FM3, etc.
  • Primary Key – You still define your key as Unique and Primary, but you turn OFF the Auto-Number switch.  As with the primary field, there are additional attributes and options that must be specified.
  • File Options – The driver string needs additional elements to tell it how to retrieve the newly assigned auto-number.  There are a few other driver string attributes that I’ll discuss as well.

I glossed over many of the particulars in the points above, as it makes more sense to include the precise details as I describe the steps for each database engine.  First up will be PostgreSQL.

I’ve found that auto-numbering is a far more complex topic that I initially expected.  I’m trying to distill all of those complexities, options and decisions into a simple set of rules.  If you feel that I missed anything crucial or misspoke, then please feel free to provide feedback.  I’ll occasionally update this article as my understanding and viewpoint evolves.