Anomalies and Normalization in Relational Database

What are anomalies and normalization in relational database, how they are related, solved or achieved.

The Beginning

By default, you already know why you want to learn Normalization and what you want to do with it. So I'm not going to go over that again here, let's get to the point a little faster.

The Anomalies

Before we move on to the Normal Forms we may first know about the anomalies.

  • Update Anomalies

    • You may update data you do not want to update

  • Delete Anomalies

    • You may delete data you do not want to delete

    • You may delete additional information

  • Insert Anomalies

    • You may need extra information that is not necessarily required to insert a data

Examples

1). Update Anomalies

For example, when we are trying to update the Address of Student Hank, we may finally update two rows. That's what we don't want.

2). Delete Anomalies

We still take the previous table as example.

First, if we want to delete Hank, we may finally delete two rows.

Besides, if we want to delete King, if he is the only student who take course Data Science, then we may lose this course in the table. That means we are deleting additional information.

3). Insert Anomalies

Also the same table. If we want to add a new student who didn't take any courses, how can we suppose to do that? Same to the courses, we cannot add a new course unless one student registered it, Chicken or the egg?.

That means, if we want to insert a new row, we have to know some additional information that is not necessary, which is not good.

Normalization Forms (NF)

Normalization is to resolve the anomalies problems.

0: UNF - Unnormalized Form

  1. 1NF

    • No repeating groups and with PK identified

  2. 2NF

    • 1NF and no partial dependencies

  3. 3NF

    • 2NF and no transitive dependencies

  4. (BCNF)

  5. (4NF)

Typically we do not go beyond the 3NF.

Important Concepts

About Keys

  • Super Key: All keys and key combinations that can "uniquely identify" a row.

  • Candidate Key: Minimum attribute(s) in the super key.

  • Primary Key: The most important key chosen from candidate key.

About Dependency

  • Functional Dependence: I would say that is one A can "uniquely identify" one B.

  • Partial Dependency: functional dependence in which the determinant is only part of the primary key

  • Transitive Dependency: attribute is dependent on another attribute that is not part of the primary key

Functional Dependency Examples

I'm going to bring down the same table here:

When you look at StudentNo, with StudentName and Address, you can find that for each StudentNo, we will get the unique other attributes (although there are multiple rows).

For example:

  • S10 -> Hank, Toronto

  • S11 -> Thomas, Markham

  • S12 -> King, Calgary

That is the functional dependency, StudentNo can functionally determine StudentName and Address. Written as StudentNo -> StudentName, Address.

Can we write it in the reverse order? (That is StudentName, Address -> StudentNo)

For current table, probably yes. But when we add more data into the table, if there's two Hank both live in Toronto, and they are two students (different id), then we can't say there's a functional dependency for StudentName, Address -> StudentNo.

However, when it comes to StudentNo and Course, we can find that:

  • S10

    • Java

    • Database

  • S12

    • Web

    • Data Science

So StudentNo cannot functionally determine Course.

Same for the CourseNo to other attributes than Course.

Normalization

Moving to 1NF

What's the problem with the this table?

The first thing is that there's no primary key. That is for now there's no attribute that can uniquely identify each row.

As for the repeating group, there are different meanings of the repeating group:

Here let's talk about multi-value attributes. I'm using a new table to talk about this:

As we can see, there are many multi-value attributes.

For the employee name, we may have first name, last name and middle name; for the children, one employee can have no or many children.

For the name, we can divide it like this.

But can we divide the Children and Children Birthday like this?

We cannot do this, because firstly there will be a lot of null values, then how can we deal with employees that have 5 children or more?

So we should do like this:

Let's go back to the primary key of the original table.

As no attribute can uniquely identify each row, can we use composite key as primary key?

Sure! if we use StudentNo with CourseNo, then we can uniquely identify each row.

and the dependency should be StudentNo -> StudentName, Address CourseNo -> Course {StudentNo, CourseNo} -> StudetName, Address, Course

After we add the primary key, then this table is in 1NF.

Moving to 2NF

Still the same table, can we say this table is in 2NF with a primary key {StudentNum, CourseNum}?

No, because there's partial dependencies.

As we have shown:

StudentNo -> StudentName, Address CourseNo -> Course {StudentNo, CourseNo} -> StudetName, Address, Course

CourseNum is part of the primary key {StudentNum, CourseNum}, so the partial dependency exists.

So if we do not have composite key and the table is already in 1NF, can we say this table is automatically in 2NF? I would say "yes".

Then how do we go to 2NF?

Firstly, we break them into 2 tables:

Looks better? We just removed the partial dependency to a new table.

But previously they are linked together, how do we do that?

I would say that depends on the relationship.

Between Student and Course, typically that should be a many-to-many relationship.

That is, one student can register many courses; one course can have many students.

So we should add an associate table to link them together:

In which, StudentNo and CourseNo are foreign keys.

Moving to 3NF

Is the previous table in 3NF?

Yes, because there's no transtive dependencies.

So how can we suppose to do something?

We make a little change to the previous table.

Do you think this table is in 3NF?

I would say no.

Because of the FD below:

If we take StudentNo as primary key, then:

StudentNo -> SIN, Student Name, Address

SIN -> Student Name, Address

This may not be a good example, but it can some how describe many things.

1). Can we use SIN as primary key?

Yes, but we can only choose one of them.

2). Why can't we use both StudentNo and SIN as primary key?

Because primary key is the minimum subset of super key.

If we can take {StudentNo, SIN} as primary key, why don't we take all the attributes?

That's because the properties of primary key:

First, primary key is indexed and we do not often (or cannot) change it. If we make unnecessary attributes to be the primary key, when we are going to change any of them, there will be troubles.

Second, primary key must not be null, so that's also why you always only want a "small enough" primary key.

So let's go back to the point.

There's a transitive dependency exists:

StudentNo -> SIN, SIN -> Student Name, Address

So how do we change to 3NF?

We do it in the same way: to break it up.

We move all attributes in the transitive dependency to a new table.

That's it.

Last updated