Data Integrity

I like that word - integrity. It implies reliability and trustworthiness. Data integrity, likewise, relates to keeping data consistent, accurate, and reliable.

To illustrate data integrity's importance, let's imagine for a minute that you rely on a database to manage customer relationships. To do that, you need updated, correct, contact information. When you communicate with a customer, whether through a phone call, email or letter, having accurate information about the customer's ordering patterns, preferences and previous conversations would be very useful, so keeping a database of these facts seems like a good place to start.

Now let's imagine that you have a few issues with the data in your database. You have order information, but can't connect it to a given customer with any degree of confidence. The last time you called a customer, you referenced his office in New Orleans only to learn that the business had moved - and you had sent products - to a new office in Denver. Looking over the most recent correspondence your business sent, you see to your dismay that his name, which is peppered throughout the letter, is in ALL CAPS, which looks really cheesy.

Good grief! Is there anything that could be done about this mess?

Data in one location, and one location only

The first place to start with ensuring data is reliable is with the database structure. If you record addresses, for instance, in multiple tables of your database and your customer moves, you have created an opportunity for information to get updated in one part of your data system and not another. Updating that address everywhere it resides is a also a duplication of effort, another opportunity to get disparate data, and a big time waster. Each data element should live in one place and one place only.

How do you relate an address to different applications in a database? Database relationships and querying. See the article here .

How else do you enforce data integrity?

Think Through Fields

Following the data in one location rule, a field or column in a table should contain one and only one piece of information. Use separate fields for first and last name: you will be glad you did when you need to list your data in order of last name + first name.

Think through the output you want from this data set, and plan your table and column structure carefully. If people start listing values in a single field, you have failed the data in one location rule. That's okay: just go back and create a related table to hold multiple values.

One additional note about fields: use naming conventions that are meaningful so that creating reports is as intuitive as possible. Spaces are a hassle when writing reports, underscores somewhat less so. I avoid both.

Data Types

Your initial table design is the place to start to ensure that data conforms to rules. If your database will contain sales figures, for example, you would not create a column for your sales amounts in string format. Calculating with text fields doesn't work well, for one thing, but users could enter text in numeric fields and cause problems when you try to aggregate data. Imagine your record set:

Item, Count
1, 8
2, 10
3, 6 gallons
4, 11
5, 9

Try aggregating that!

Likewise, create date fields in date/time format. You might be surprised at how often you could end up performing calculations with dates, and formatting in reports can also be an issue if you have gone the text route. Don't go the text route.

Boolean data types are handy for true/false and yes/no columns. The data type saves space in the database, helping it to perform well during data entry and querying, and can easily be transformed into more meaningful groups and text fields in reports.

Don't forget small details like specifying the number of places beyond the decimal point. You have a lot of control in the design phase.

Let's tackle that capitalization problem next.

What? That is possible? This is a dream come true! (I know you are as excited as I am about this data integrity stuff!)

Input masks

Input masks help enforce data entry rules. In MSAccess, applying a capitalization mask is simple: This Microsoft article explains all of the ways input masks can be applied to tables, forms and queries. To force capitalization of the first letter in a name and small letters for subsequent letters in a string, try >L????????????? as an input mask.

Dates are another example of a database column that benefits from an input mask. What if your sales department had to work from a document that looked like this?

Date Item Price
5-May-2015 Cheese, yellow 5.00
5/9/2014 White cheese $5.09
06/01/2013 Yellow cheese $4

It hurts just looking at it, doesn't it? Reports and forms that look this way make working from them difficult, and making sense of them a chore.

Other rules that might come in handy

Keys and indexed fields are useful to ensure that blank records cannot be created and that important data can be retrieved.

Apply additional rules as needed: you can disallow Null (empty) values in important fields. Imagine a customer database, for instance, where the first and last name of the contact was missing. How are you going to communicate with that customer?

Many other features available in the design phase of database creation help you assist data entry and keep the data meaningful. When designing user interfaces, use help text where needed, order the fields with the process in mind, and use visual cues wherever possible.

Adding a few rules will save you and your team time and headaches, and will reduce the chance of a really bad surprise. I saw a case when, after collecting data for several months, a manager found that none of it could be used: some of her staff entered values as ounces, others as pounds, and there was no avenue after-the-fact for discerning which was which. (You should be thinking through database design options here.)

Time spent preventing easily predicted problems is a good investment.

Topic: 
Databases