Database Primary Keys

The primary key that you define for your tables is critical to a well performing and well designed database.  The primary key to your table is very important so it’s worth spending some time on it BEFORE you start coding.

Let’s start out by describing what the primary key does.  The primary key is the unique identifier of a record for each table.  Regardless of how many rows the table has in it, the primary key is the pointer to a particular row.  This key will never change during the entire lifespan of the row.  There will never, ever, be two records with the same ID.

The primary key should never change.  This is why using a text field as a primary key that contains human readable/changeable data is such a bad idea.  I’ve worked on databases where they used Name as a primary key.  This is such a bad idea especially in cultures where women change their last name when getting married.  Even exempting female name changes most places in the world allow you to change your name legally.  Since it can change it’s a bad idea for use as a primary key.

Another example that we’ve dealt with in the past is using a Social Security Number (national id).  Another bad idea for a number of reasons.  One, data entry mistakes happen.  Two, government agency mistakes occasionally happen where there are two (or more) people with the same SSN.  Three, it’s not uncommon in certain parts of the country where undocumented workers use the same SSN.  Four, identity fraud is common.  Because this means it’s possible for two people or more people to have the same SSN it’s not a unique identifier.  Avoid using it.  It’s  a bad idea.

We recommend using an auto increment integer primary key.  Let the database do what it does best and let it pick it out for you.  That way there will never be any collisions or duplicates.  It’s what databases do very well all on their own.

I’ve worked on projects where there was no auto-increment primary key and was defined as an integer.  The developer devised a scheme to get the largest record ID and increment it by one when needed.  This was fine when a single user used the app but when multiple people were inserting data at the same time collisions occurred generating errors because primary keys must be unique.  It was ugly to fix and it all would have been avoided if they had used an auto increment primary key to begin with.

Someone asked me the other day if ActiveRecord could use a text primary key.  No, and it won’t ever.  It expects an integer primary key.  If you ARE using a text primary key.  Stop.  Add an auto-increment primary key and then add a Unique Index on the text field.

Real Studio/Xojo has some interesting things happen if you try to open a recordset that’s not returning the primary key in the query.  It won’t be able to update it in some cases.  You’ll get a generic message about not know which record you’re trying to access.  I’ve had some issues, in the past, using many-to-many tables that couldn’t be edited even though they had a unique index on fields.  So I recommend adding an auto-increment primary key event to many-to-many tables.  Then, if you have constraints on your index (like the index needing to be unique) you can add them there.  So far it’s worked for me.

For primary keys we use a standard naming convention:  tablename_ID.  A lot of people use a generic ID field but we’ve found that to be less than ideal when doing table joins.  If you have table A, B, and C all with an ID field joining them together means I have to put an alias in my SQL statement so I can retrieve those primary keys later.  By using the A_ID, B_ID, C_ID I never have to alias the field names in a query.  Plus, it has the added benefit of making the SQL easier to read because the name is explicit and the _ID tells me its a primary key.

Each database has its own internal record id.  SQLite uses rowid and you can actually query against it.  If you haven’t specified a primary key in your table it will use that rowid as the primary key.  When you specify a primary key, the specified primary key and rowID are the same.  However, if you do not specify the primary key the rowID is not guaranteed to stay the same.  If you were relying upon rowid to identify records in other tables you might be putting yourself in through a world of hurt if the user ever does a vacuum on the database.    The moral of the story is don’t rely upon the internal id – always specify your own primary key (that auto increments if you haven’t heard it enough now).

If you get into the habit of using an auto increment integer primary key you’ll avoid many problems later.  You don’t have to use our naming standard for primary keys but doing so will improve the readability of your SQL statements.  Primary keys are too important for anyone/thing but the database to figure them out.

Happy Coding!