• You are here:
  • Home »
  • Archive: December, 2016

Archive Monthly Archives: December 2016

Naming Tables

Use Singular Names

Let’s take a table that stores invoices.  A lot of people would name that table “Invoices”.

Simple enough, but what about related tables?

A table that contains invoice line items may be called “InvoiceItems”.  Let’s add a third called “InvoiceItemDetails”.

What you end up with now, is the following order:

InvoiceItemDetails
InvoiceItems
Invoices

Now your tables are in alphabetical order, but completely out of logical order.  Invoices is the parent table, so it should come first, followed by InvoiceItems, and InvoiceItemDetails.

This is why I like using the singular:

Invoice
InvoiceItem
InvoiceItemDetail

Simply removing the “s” puts everything in logical order, the way they should be.

It’s a simple thing, but it makes it easier to find what you’re looking for.

Database Basics – Backup, THEN Update

They Did What?

Don’t ever update your database without taking a backup first.

That shouldn’t need to be said, but it does.

I built a critical database system for a client, and it ran great for years.

Until Company B came along to install a new system.  I’ll leave them unnamed, although do we really need to protect the guilty?

Anyway, they needed to install THEIR database.

Usually not a problem.  You have SQL Server running, you add a database, you add another database, you even add a new instance if need be.

But what would possess you to delete the existing database, uninstall SQL Server, re-install it, set up your database, and then restore a 4-month old copy of MY database you found somewhere on the network?

It happened.  All I could say was, “Wow…”.

Backups Saved the Business

Because of the particular nature of this client, the loss of a database like this could have crippled them for months, or even put them out of business completely.

Fortunately, I had them making regular backups, and after a massive panic on their part I was able to restore everything back to normal.

They lost a day’s worth of work before they noticed something was wrong, and another day waiting to have it fixed, but that was a minor inconvenience compared to what could have happened.

There are a lot of things I could have foreseen happening to the system that would require backups, but never once did it occur to me that another database “expert” would come in and flat out delete something that didn’t belong to them.

At the very least, they could have taken a backup first.  Instead, they found an old copy and restored it, hoping nobody would notice.

If you are doing any kind of work on a database, no matter how minor, take a backup first.  You never know what could go wrong…

Repeating Calculations in SQL

Don’t Repeat Yourself (SQL Version)

Sometimes we have a calculation in a query, and we need to use the result of that calculation in multiple places.

The way I usually see that done is to repeat the calculation multiple times.  For example:

SELECT DISTINCT
    A.SomeNumber * LOG(B.NextNumber) + POWER(C.ThisNumber, D.Exponent) AS FirstValue,
    100 - (A.SomeNumber * LOG(B.NextNumber) + POWER(C.ThisNumber, D.Exponent)) AS SecondValue,
    25 * (A.SomeNumber * LOG(B.NextNumber) + POWER(C.ThisNumber, D.Exponent)) AS ThirdValue
FROM
    TableOne A
    INNER JOIN TableTwo B ON B.AKey = A.AKey
    INNER JOIN TableThree C ON C.BKey = B.BKey
    INNER JOIN TableFour D ON D.CKey = C.CKey

There’s one main calculation in there, which is being used for multiple columns:

A.SomeNumber * LOG(B.NextNumber) + POWER(C.ThisNumber, D.Exponent)

Besides being difficult to read and understand, it’s really easy to make a mistake when you have to do updates.

And this is a fairly simple, made-up example.  I’ve seen queries that used CASE statements to perform different calculations depending on various criteria, with the entire CASE statement repeated multiple times throughout the query.  Ugly stuff…

When writing code, we try to use the Don’t Repeat Yourself (DRY) Principle, but this applies to queries as well.

The Solution

What we can do is simplify the main query to get rid of the repeated code, and then wrap it with another query that uses those calculated values:

SELECT
    InnerValue, (100 - InnerValue) AS SecondValue, (25 * InnerValue) AS ThirdValue
FROM
(
    SELECT DISTINCT
        A.SomeNumber * LOG(B.NextNumber) + POWER(C.ThisNumber, D.Exponent) AS InnerValue
    FROM
        TableOne A
        INNER JOIN TableTwo B ON B.AKey = A.AKey
        INNER JOIN TableThree C ON C.BKey = B.BKey
        INNER JOIN TableFour D ON D.CKey = C.CKey
) AS T

The main calculation is now done in a subquery.  The outer query uses InnerValue for the SecondValue and ThirdValue columns, instead of repeating the calculation itself.

So now the calculation is written only once, and only needs to be updated in one place.  Plus, it’s much easier to read, understand, and debug.

Naming Primary Keys

Poorly Named Fields Lead to Mistakes

There are lots of things that irritate me about poorly named database fields, but one of the worst is naming the primary key field “ID”.

Naming primary keys, or any other database field, is an afterthought for many programmers, but not taking the time to use a consistent naming convention leads to confusion and errors.

Let’s take 3 tables – Product, Invoice and InvoiceItem.  And let’s assume we’re using artificial (surrogate) keys (my preference).

Product
ID
ProductName

Invoice
ID
InvoiceNumber

InvoiceItem
ID
InvoiceID
ItemID
ItemName

It’s pretty easy to guess in this example that Invoice.ID = InvoiceItem.InvoiceID.  But you still need to go into the table design to verify the relationship actually exists.

But what about that ItemID field?  It should relate to the Product.ID field, but the name is misleading.  Now we have no choice but to go into the table design to find out which field links back to the Product table.

Also consider your queries:

SELECT Invoice.ID, InvoiceItem.ID, Product.ID, Product.ProductName
FROM Invoice
INNER JOIN InvoiceItem ON InvoiceItem.InvoiceID = Invoice.ID
INNER JOIN Product ON Product.ID = InvoiceItem.ItemID

First, the relationship between the fields is a mess because there is no consistent naming.  You can’t tell just from looking whether or not the query is correct.

Then, you end up with the following dataset:

ID     ID     ID     ProductName
16     23     49     My Product

Which ID came from which table?

This is a simple example, of course, but when you’re writing a complex query with multiple tables, having an “ID” field in every table gets confusing really fast.  It gets exponentially worse when you have more tables and lots of foreign keys.

Choose a Naming Convention

A better way is to use the table name in the ID field:

Product
ProductID
ProductName

Invoice
InvoiceID
InvoiceNumber

InvoiceItem
InvoiceItemID
InvoiceID
ProductID
ItemName

The primary key is always tableID.  It’s consistent in every table, and the foreign key has the same name as the primary key.

The InvoiceID fields are now obviously related, and you don’t need to guess which fields to join on.  Same with the ProductID fields.

Take the time to give your primary key fields unique names that are consistent from one table to the next.  It avoids confusion, and makes it easier to check that your query is correct.

Naming primary keys properly is a simple way to make your database easier to use.  The programmer who takes over after you leave will appreciate it.