neatComponents - Designing your database


Previous page	Data Integration	Next page
Designing your database

"Designing a database" - sometimes called normalizing your data - sounds complex, but in fact is usually just a matter of doing three things:

Identifying fields
Eliminating duplicate fields
Eliminating duplicate data

Naturally there is some subtlety to this, and you may well find your design changes over time – both as you find 'cleaner' ways of implementing the design, and to keep up with changes in requirements. Don't worry too much about the need for changes – the system is designed with that in mind, and will work with you, enabling you to modify your structure without losing any data you have already loaded into the system.

As we have already seen, a database is made up of Tables and Fields. The first job is to identify what these should be. The second job is to see if we got it right, and if not, tweak it until it is.

This may sound luddite, but the design is best done on paper, and only when it's clear on paper, should you start configuring it in your site.

So, blank sheet of paper to hand, where do you start? First off, make a list of the types of things you want to store information about. These will hopefully turn out to be the names of the various Tables. Then, against each Table, make a list of the types of information you want to store. These will become the Fields.

The likelihood is that there will initially be some duplication in the Field names, and the trick is to try to eliminate these duplicates by moving these duplicate items off into a different Table, and linking across to them with a relationship.

This is best explained with an example:

Case Study
A vehicle hire company needs a database of their vehicles.
The initial design identifies that they manage two types of things: Cars, and Buses.
For each Car they need to know the Registration number, Model, Number of passengers.
For each Bus they need to know the Registration number, Model, Carrying capacity and Height.
Now you could create a database with two Table: "Car" and "Bus" with individual sets of fields, however it's clear that there's an overlap in the information being stored. (What's more, it likely we'll be using the information from both Tables in similar ways, so we could end up with creating duplicate Queries down the line if we're not careful).
So, to avoid the duplicate fields, we'll replace them with a single Table, "Vehicle", which contains all the fields.
We also need a field to say which type of vehicle it is (Car or Bus) since we originally knew this implicity from the Table names, but don't otherwise have it since we've combined the Table.
Vehicle:
Registration number
Type
Model
Number of passengers
Carrying Capacity
Height

We have now eliminated the duplicate fields, so it's time (still on paper) to see how it looks with some data in it.
Vehicle
Reg Model Type Passengers Capacity Height
AB123 Transit Bus 50 1.2
34DSE Focus Car 4
54ESA Focus Car 4
TR453 Transit Bus 50 1.2
GT546 Shorty Bus 52 1.3

It's clear that we are repeating ourselves. Whenever we say Transit, we know that the rest of the fields will be the same. This is bad news, since it is increasing both the work required to enter the data, increasing the possibility of error and inconsistency as the information is added, and making it tedious should the information need to be changed later on.
What we need to do is to separate out the repeating information into a separate Table. In this case, the field that defines what the rest of them will be is the Model field, so this is the one we'll promote into being its own Table:
Vehicle:
Reg
Model
Model:
Name
Type
Passengers
Capacity
Height
Note that we have still got a Model field in the Vehicle Table, but now it isn't a text field containing the name of the model, but a Recordlink field, to identify which record in the Model Table contains the rest of the information. The Model Table now has a Name field, which contains the name of the model.

Managing data