Test Data

At XDC 2019 my session was titled Xojo Design Mistakes (the alternate but way longer title was ‘Thankfully time travel doesn’t exist or my future self might travel back and murder my younger self for the stupid coding mistakes I’ve made’).  These are things that I’ve discovered over the years, both in my own projects and in other peoples projects that are just plain wrong or less than ideal.  This will be an on-going series since I had well over 50 slides and left 150 out.  So when I get bored I’ll bring these topics up.

Nearly all of our consulting projects are database driven applications.  It’s why we’ve created the tools to help with these projects like ARGen, which simplifies our interactions with the database, and BKS Shorts, which is our own reporting tool.  These tools are invaluable in getting our work done in a timely matter.

In a database application it’s typical to have a List of something.  A common example of this is a Customers list.  In that list the client typically wants the ability to Create, Read, Update, and Delete (or CRUD) a customer with varying degrees of rules behind it (like do they have permissions to add or delete a customer?).

During development we get the List form going, add the controls to be able to add a new record.  Then we create the Add/Edit form that allows us to test those capabilities.  We create a few, update a few, delete a few customers and then move on.  Maybe the client wants search capabilities so we add that to the List window and when we’ve tested it against our half dozen or so records we move on to the next task.

There is nothing wrong with this process.  It works and it’s fairly efficient as far as it does.  However, there’s one thing we’ve skipped that’s really important but also difficult to achieve.

So far we’ve test with *maybe* a dozen records.  What happens when the client adds 10,000, or 100,000 Customer records?  Does the list form take a long time to load?  Does the search function take a long time?  What about the Customer popup menu’s that you’ve scattered throughout the project – are those now slow, unwieldy, and unusable?

Unfortunately, with the way we implemented the project we don’t know how any of this works since we only have a dozen records.  So it’s really important to have adequate amounts of test data.  Creating 10,000 new customers using your new interface would take a long time.  So what can you do?

There are tools out there that will help generate data sets.  These tools allow you to create thousands, even millions of rows of realistic data.  Randomized male and female first names along with a last names is a great way to generate customer names.  Many tools allow you to add random dates in a range, random IP addresses, random values from a  list you provide and so on.  The sky is the limit when it comes to what sort of data developers need.

Now, when you do your testing you see how your application reacts with a lot of data.  I almost guarantee that it will act different.  Do you need to switch to a data-on-demand listbox?  Do you need to put an index on a common searchable field to speed up indexing?  Do you need to implement Full Text Search in your database?  Having a huge amount of data will answer these questions for you.

I once worked on an accounting application in VB6 where the original database designer using an Access database and did an account balance on the fly iterating through bills, checks, journal entries, etc. With a few thousand rows of data in each table this process took a second or two for all balances on a local machine. When this database was accessed over the network it took 5 to 7 seconds. When we converted our first client database it took 30 to 40 seconds for EACH account! Obviously this was not acceptable performance from an accounting application meant to be used daily by general contractors with hundreds of employees and tens of thousands of customers. The solution was to have a current balance value that was stored and then updated when a transaction occurred. We could have saved ourselves hundreds of hours of rushed development time (and much stress and heartache) if we had tested with large amounts of data much earlier in the process.

I mentioned adding an Index to a field earlier. One word of caution on this: it’s tempting to add an index to every field you’re searching on. Don’t do this! Only added indexes to the most important fields in a table. For a customer maybe the two most important fields are phone number and name even though you search on City and things like that. Indexing is extra work for the database so performance can take a signifiant hit with indexing a field.

Since the toolI’ve been using to create test data is no longer being sold I’m curious what you’d recommend.  Do you have a favorite tool?  Or is this a tool that would be of use to the community?

Happy Coding!

Looking At MySQL Again

I installed everything on Mac OS X (Leopard) using the standard Mac installer.  I didn’t have any issues.  There are two other parts of the installation package, a startup item installer allowing the db server to start at startup and, a prefpane that allows you to start/stop the server from System Preferences.

MySQL has an optional package that installed the MySQL Administrator and MySQL Query Browser applications.  It’s obvious from both of these tools that they’ve spent a lot of time and effort in making these tools usable and for the most part I was happy with their smoothness in Mac OS X.  They definitely don’t feel like a port of Windows apps to Mac OS X.  (Without using them in Windows I can’t tell you that the opposite is true or not, however.)

The REALbasic MySQL Plugin now available from Alacatia Labs at http://alacatialabs.com/products/realbasic-mysql-plugin/ and works with the Community and Enterprise Edition.  I had absolutely no problems connecting to my newly installed database (after adding a new db and user using the Administrator tool).

If it weren’t for the stupid licensing issues that accompany MySQL I’d recommend it for everyday use.  Alas, the licensing issues make that problematic.  From the Alactia Labs website:

It allows access to community installations of MySQL database servers using REALbasic’s built-in database API. While we are not lawyers, our interpretation of the GPL is that it is viral, and any applications that are distributed publicly must also contain the source code of the application and plugin. If you are in doubt about how the GPL applies to you, please consult your attorney.

Emphasis added by me.
That sucks because I think they’ve got some things going for it.  I know a lot of RB developers have stopped using MySQL due to the GPL licensing rules and I can’t say that I blame them.  Oh well, I guess it’s time to look at PostgreSQL or maybe MS SQL Server.