Database Primary Keys

The primary key that you define for your tables is critical to a well performing and well designed database.  The primary key to your table is very important so it’s worth spending some time on it BEFORE you start coding.

Let’s start out by describing what the primary key does.  The primary key is the unique identifier of a record for each table.  Regardless of how many rows the table has in it, the primary key is the pointer to a particular row.  This key will never change during the entire lifespan of the row.  There will never, ever, be two records with the same ID.

The primary key should never change.  This is why using a text field as a primary key that contains human readable/changeable data is such a bad idea.  I’ve worked on databases where they used Name as a primary key.  This is such a bad idea especially in cultures where women change their last name when getting married.  Even exempting female name changes most places in the world allow you to change your name legally.  Since it can change it’s a bad idea for use as a primary key.

Another example that we’ve dealt with in the past is using a Social Security Number (national id).  Another bad idea for a number of reasons.  One, data entry mistakes happen.  Two, government agency mistakes occasionally happen where there are two (or more) people with the same SSN.  Three, it’s not uncommon in certain parts of the country where undocumented workers use the same SSN.  Four, identity fraud is common.  Because this means it’s possible for two people or more people to have the same SSN it’s not a unique identifier.  Avoid using it.  It’s  a bad idea.

We recommend using an auto increment integer primary key.  Let the database do what it does best and let it pick it out for you.  That way there will never be any collisions or duplicates.  It’s what databases do very well all on their own.

I’ve worked on projects where there was no auto-increment primary key and was defined as an integer.  The developer devised a scheme to get the largest record ID and increment it by one when needed.  This was fine when a single user used the app but when multiple people were inserting data at the same time collisions occurred generating errors because primary keys must be unique.  It was ugly to fix and it all would have been avoided if they had used an auto increment primary key to begin with.

Someone asked me the other day if ActiveRecord could use a text primary key.  No, and it won’t ever.  It expects an integer primary key.  If you ARE using a text primary key.  Stop.  Add an auto-increment primary key and then add a Unique Index on the text field.

Real Studio/Xojo has some interesting things happen if you try to open a recordset that’s not returning the primary key in the query.  It won’t be able to update it in some cases.  You’ll get a generic message about not know which record you’re trying to access.  I’ve had some issues, in the past, using many-to-many tables that couldn’t be edited even though they had a unique index on fields.  So I recommend adding an auto-increment primary key event to many-to-many tables.  Then, if you have constraints on your index (like the index needing to be unique) you can add them there.  So far it’s worked for me.

For primary keys we use a standard naming convention:  tablename_ID.  A lot of people use a generic ID field but we’ve found that to be less than ideal when doing table joins.  If you have table A, B, and C all with an ID field joining them together means I have to put an alias in my SQL statement so I can retrieve those primary keys later.  By using the A_ID, B_ID, C_ID I never have to alias the field names in a query.  Plus, it has the added benefit of making the SQL easier to read because the name is explicit and the _ID tells me its a primary key.

Each database has its own internal record id.  SQLite uses rowid and you can actually query against it.  If you haven’t specified a primary key in your table it will use that rowid as the primary key.  When you specify a primary key, the specified primary key and rowID are the same.  However, if you do not specify the primary key the rowID is not guaranteed to stay the same.  If you were relying upon rowid to identify records in other tables you might be putting yourself in through a world of hurt if the user ever does a vacuum on the database.    The moral of the story is don’t rely upon the internal id – always specify your own primary key (that auto increments if you haven’t heard it enough now).

If you get into the habit of using an auto increment integer primary key you’ll avoid many problems later.  You don’t have to use our naming standard for primary keys but doing so will improve the readability of your SQL statements.  Primary keys are too important for anyone/thing but the database to figure them out.

Happy Coding!

12 thoughts on “Database Primary Keys

  1. First I am totally self taught. My background and education is in a physical science not computer science.

    I learned a long time ago (about 25 years) that it is very bad practice to use ANY end user readable data for the primary. If people can see it and it contains meaningful data to end users, almost always sooner or latter it will need to be changed for some reason.

    One thing i keep debating about keys is if its worth doing an auto increment primary key on list tables.

    By a list table i mean one where what is unique for a record is the combination of the primary keys of two other tables… Something i often find myself creating.

  2. BTW I have gotten into the habit of Using SQL to modify tables most of the time and usually use recordsets only to read data. IF I’m going to update the recordset I always select the primary key. That avoids the primary key update issue.

  3. Karen :

    One thing i keep debating about keys is if its worth doing an auto increment primary key on list tables.

    By a list table i mean one where what is unique for a record is the combination of the primary keys of two other tables… Something i often find myself creating.

    We give EVERY table an auto increment primary key. This includes list tables (we call those TR reference tables) and many-to-many tables (we call those TX tables).

    Like you, I’m self taught in the database arena. My wife, the Data Goddess, has beat it into me that all tables should have the auto increment primary key. She’s been working with DB’s for many years in some very, very large corporations so I’ll take her advice. We get into less arguments that way. 🙂

    She’s a stickler with the name conventions. Table names should be singular not plural (person vs people, PRtax vs PRtaxes, etc). The more tables/fields you get the more important this is.

  4. By TX do you mean :
    http://en.wikipedia.org/wiki/Many-to-many_(data_model)

    The app I an writing right now uses that typed of 3rd list Table…

    Interestingly they recommend using the primary keys of the two parent tables as the primary key of the list table… Which was what I was debating with myself. BTW when i started this app I I did chose to do an autoincrement key and also make the combination of the two foreign keys unique.

    In this case why does your wife recommend using an autoincrement primary key? It would seem that a unique concatenated key makes an autoincrement key superfluous.

  5. “Real Studio/Xojo has some interesting things happen if you try to open a recordset that’s not returning the primary key in the query”

    Yikes! So any SELECT statement I execute needs to include the Primary ID? Don‚t think I read that anywhere in the documentation, so that is important info!

  6. One scenario in which I, technically, do not use an auto-increment for is a data array table. The table has two indexes as the primary key (id_permutation, id_index). The first references another table, which does use auto-incrementing (so that only sort of counts), but the latter always starts at 0 and has a row for each index of the arrays. What’s important to note is that many columns will have the same id_permuation, and many will have the same id_index, but no two will have the unique combination of the two — which is what really defines a primary key.

  7. I was reading about what you said about Active Record primary keys will never be text fields.

    Have you considered using UUIDs for keys? With apps that have local databases that sync back to a master database, sequential numeric IDs don’t work well. With numeric IDs each user will end up using the same numbers in their local databases which when sync’d to the master database will end up having records with the same IDs being pushed.

    Is there another way to deal with this? We’ve been using UUIDs in FileMaker with great success.

  8. @Hal Gumbert
    UUID’s might be text fields but we don’t consider them in the same light. Why? Because no human can reasonably read them and make sense of them.

    It would be possible to get ActiveRecord working with UUID’s. Currently we’ve limited it to Integer fields but there’s nothing technically stopping us (or anyone with the AR code).

    SQLite doesn’t have a built-in UUID field type though you can generate one yourself and use it in the Insert but then you’ve got two things to do and the possibility of making a mistake can occur and that’s a ‘bad thing’ since I advocate letting the database do the work of making the primary key.

    I believe all of the DB servers have UUID Field types so those wouldn’t be as big a deal.

    We used UUID’s when synchronizing databases and it really does eliminate some issues. In that case (with SQLite) we’ve been generating our own UUID and using that field to check the server/client databases.

    I found a good article discussing Integer Primary Key vs UUID’s at http://www.codinghorror.com/blog/2007/03/primary-keys-ids-versus-guids.html

  9. @Markus
    If you are using SQLite and REALSQLDatabase the rowid is inserted automatically for you. This has led to some…interesting…results for some developers. But, it lets you query against only the fields you want to mess with in your recordset without specifically getting the primary key.

    In Xojo 2013 REALSQLDatabase has been deprecated (it’s still there so no panic) and you should use SQLiteDatabase which does NOT do anything with rowid in queries. So if you migrate to the new class this issue will happen if you don’t specifically add the primary key to your query.

  10. @Bob Keeney
    After using UUIDs for awhile, I don’t see that the human readability is much of an issue, but the auto generation could be an issue. I’m working on a new desktop, web, and iOS app and plan to use Active Record with UUIDs.

    Great article, by the way for folks new to databases.

  11. @Hal Gumbert
    Right, UUID’s are too damn big to be human readable. Though I find them much harder to use when needing to determine the sequence of something.

    Let me us know if there’s anything we can do to help with ActiveRecord and UUID. We’re more than happy to integrate whatever changes you come up with into the download.

Comments are closed.