Database Normalization -

August 31, 2024

Beginner, Databases, Microsoft SQL Server

Database normalization

If you are new to databases or have years of experience, it is key to grasp normalization basics to build well-organized and smooth-running databases. In this post we will focus only on that.

Introduction

Normalization is a method of organizing data to reduce redundancy, with the goal of creating a more efficient, reliable, and simple database. The main goals of the normalization is to organize data, remove data redundancy, ensure data integrity, ensure data consistency and optimize the database for storage and retrieval of the data.

There are six normal forms:

First normal form (1NF)
Second normal form (2NF)
Third Normal Form (3NF)
Boyce-Codd normal form (BCNF)
Fourth normal form (4NF)
Fifth normal form (5NF)

In most of the cases we are ok with the database being in the 3rd normal form. Higher level of the database normalization can lead to the slower query performance in databases where data retrieval is very important. That is one of the reasons why analytical databases are highly denormalized. Going beyond 3th normal form can also increase complexity of the database which can be bad for the onboarding of the new members and juniors as well as a bigger challenge for the current developers and database administrators.

In this post and related examples I will focus only on the first three normal forms. In future I will cover some use cases where other normal forms could be used.

1NF requires that each value in a column is atomic and that all rows have the same number of fields. This means that we should not store for example city, country and address in a single column because that complicates tasks like sorting records by city. If we want to store mentioned data in a single column then a relational database like MSSQL is not a suitable option and we can use a NOSQL columnar database like Casandra.

2NF requires 1NF to be valid, all rows are identified separately and completely dependent on the primary key and there are no repeating columns. Example for that would be a table which contains course id and name columns and instructor id and name columns. In this case the instructorName column would not depend on courseId which would violate 2NF and we would need to split the table into Course table, Instructor table and junction table if the instructor can teach multiple courses.

3NF requires 2NF to be satisfied and there are no transitive dependencies which means that no non-key attribute is dependent on another non-key attribute. This means that all non-required key columns must be independent of each other. Example of this would be if we add to table from 2NF example additional column Department name and department description which is recurring and which needs to be moved to separate table.

You can read more about Normalization on this links:

Freecodecamp – database normalization

Microsoft database normalization documentation

Javatpoint dbms normalization

Example customers

Unnormalized Data

Customer ID	First Name	Last Name	Email	Phone Number	Address	City
1	John	Doe	johndoe@gmail.com	555-1234	123 Main St	Anytown
2	Jane	Smith	janesmith@gmail.com	555-5678	456 Elm St	Anytown
3	Bob	Johnson	bobjohnson@gmail.com	555-8901	789 Oak St	Othertown
4	Sue	Lee	suelee@gmail.com	555-2345	321 Maple St	Othertown

In this table:

The City column has repeated values like “Anytown” and “Othertown”
The Address column is tied directly to each customer leading to some redundancy if multiple customers live in the same city or even at the same address

First Normal Form (1NF)

To achieve 1NF, we need to ensure that each column contains atomic (indivisible) values, and each record is unique. The table is already in 1NF because each field contains a single value per record and there are no repeating columns.

Second Normal Form (2NF)

Since the table is 1NF, there are no partial dependencies and all non-key attributes depend on the primary key, the table is already in 2NF.

Third Normal Form (3NF)

To achieve 3NF we will create two new tables:

Addresses table where we are going to move addresses and reference the city they belong to
Cities table where we will move city names
(Note) in this example customer belongs to only single address. We are avoiding junction table in this example and it will be covered in next examples. In this example this case would be covered by another table called “junction table” which would contain CustomerID and AddressID giving us many to many relationship between those tables

Normalized tables

Customers Table:

Customer ID	First Name	Last Name	Email	Phone Number	Address ID
1	John	Doe	johndoe@gmail.com	555-1234	1
2	Jane	Smith	janesmith@gmail.com	555-5678	2
3	Bob	Johnson	bobjohnson@gmail.com	555-8901	3
4	Sue	Lee	suelee@gmail.com	555-2345	4

Addresses Table:

Address ID	Address	City ID
1	123 Main St	1
2	456 Elm St	1
3	789 Oak St	2
4	321 Maple St	2

Cities Table:

City ID	City Name
1	Anytown
2	Othertown

This structure eliminates redundancy by storing city names only once in the cities table and addresses only once in the addresses table. Each customer now references their address using the Address ID.

Summary

Customers table contains customer-specific data, including the first name, last name, email, phone number, and a reference to the address
Addresses table stores address details and references the city using City ID
Cities table stores city names

Example Employee timecards

Unnormalized Data

Employee ID	Employee Name	Pay Rate	Week Ending	Tuesday	Wednesday	Thursday	Friday	Saturday	Sunday	Total Hours
101	John Smith	$15/hr	26/2/2022	8	8	8	8	0	0	40
102	Jane Doe	$20/hr	26/2/2022	8	8	8	8	0	0	40
103	John Smith	$15/hr	5/3/2022	8	8	8	0	0	0	32

In this table:

Employee details (like name and pay rate) are repeated for each timecard
Days of the week are columns rather than rows, which leads to a less flexible design

First Normal Form (1NF)

To achieve 1NF, we ensure that each column contains atomic values and there are no repeating groups of data. In this case days of week are not considered repeating groups because they each represent a distinct day of the week. Just because by definition it is not repeating groups it does not mean it is good. Example of repeating columns would be if we had for example Phone1, Phone2, Phone3 columns. The table is in 1NF.

Second Normal Form (2NF)

The table is 1NF but we have partial dependencies so it does not satisfy 2NF.

Third Normal Form (3NF)

Even if the table were to be brought to 2NF, the total hours column would still be problematic because it can be derived from the sum of the daily hours (Tuesday, Wednesday, etc.). This creates a transitive dependency where total hours depends on the daily hours which depends on the primary key.

Normalized tables

Employees Table:

Employee ID	Employee Name	Pay Rate
101	John Smith	$15/hr
102	Jane Doe	$20/hr
103	Joe Brown	$18/hr

Timecards Table:

Timecard ID	Employee ID	Week Ending
1	101	26/2/2022
2	102	26/2/2022
3	103	26/2/2022
4	101	5/3/2022

Days of Week Table:

Day of Week ID	Name
1	Monday
2	Tuesday
3	Wednesday
4	Thursday
5	Friday
6	Saturday
7	Sunday

Timecard Entries Table:

Timecard ID	Day of Week ID	Hours
1	1	8
1	2	8
1	3	8
1	4	8
1	5	8
2	1	8
2	2	8
2	3	8
2	4	8
2	5	8
3	1	8
3	2	8
3	3	8
3	4	8
4	2	8
4	3	8
4	4	8
4	5	8

Summary

To bring this table to 2NF and 3NF:

2NF removes partial dependencies by splitting the table into separate tables such as an Employees table for Employee Name and Pay Rate and a Timecards table for Week Ending and the daily hours
3NF removes transitive dependencies by eliminating the Total Hours column relying instead on calculating totals dynamically based on the daily hours stored in the Timecards table
Employees table contains employee-specific data including the name and pay rate
Timecards table contains timecard details linking each timecard to an employee and a specific week
Days of week table stores day names and their corresponding IDs
Timecard entries table records the number of hours worked for each day of the week for each timecard

Example countries

Unnormalized Data

Country Name	Capital City	Population	Continent	Area (sq km)	Language
USA	Washington D.C.	331,449,281	North America	9,147,593	English
Canada	Ottawa	38,048,738	North America	9,984,670	English, French
China	Beijing	1,402,193,353	Asia	9,596,961	Mandarin
Australia	Canberra	25,762,395	Oceania	7,692,024	English
Brazil	Brasília	213,445,417	South America	8,515,767	Portuguese
Mexico	Mexico City	130,262,216	North America	1,964,375	Spanish
Russia	Moscow	145,912,025	Europe	17,098,242	Russian

First Normal Form (1NF)

In this example in Language column we have multiple values and because of that table is not in 1NF

We can transform it to 1NF by splitting English,French into separate rows:

Country Name	Capital City	Population	Continent	Area (sq km)	Language
USA	Washington D.C.	331,449,281	North America	9,147,593	English
Canada	Ottawa	38,048,738	North America	9,984,670	English
Canada	Ottawa	38,048,738	North America	9,984,670	French
China	Beijing	1,402,193,353	Asia	9,596,961	Mandarin
Australia	Canberra	25,762,395	Oceania	7,692,024	English
Brazil	Brasília	213,445,417	South America	8,515,767	Portuguese
Mexico	Mexico City	130,262,216	North America	1,964,375	Spanish
Russia	Moscow	145,912,025	Europe	17,098,242	Russian

Second Normal Form (2NF)

To move table to 2NF we need to create primary key on table and move the language column into separate table:

Countries Table:

Country ID	Country Name	Capital City	Population	Continent	Area (sq km)
1	USA	Washington D.C.	331,449,281	North America	9,147,593
2	Canada	Ottawa	38,048,738	North America	9,984,670
3	China	Beijing	1,402,193,353	Asia	9,596,961
4	Australia	Canberra	25,762,395	Oceania	7,692,024
5	Brazil	Brasília	213,445,417	South America	8,515,767
6	Mexico	Mexico City	130,262,216	North America	1,964,375
7	Russia	Moscow	145,912,025	Europe	17,098,242

Country_Languages Table:

Country ID	Language
1	English
2	English
2	French
3	Mandarin
4	English
5	Portuguese
6	Spanish
7	Russian

Third Normal Form (3NF)

We have now table in 2NF. To improve it even further we are going to pull continents to new table and do some changes on current tables:

Countries Table:

Country ID	Country Name	Capital City	Population	Area (sq km)	Continent ID
1	USA	Washington D.C.	331,449,281	9,147,593	1
2	Canada	Ottawa	38,048,738	9,984,670	1
3	China	Beijing	1,402,193,353	9,596,961	2
4	Australia	Canberra	25,762,395	7,692,024	3
5	Brazil	Brasília	213,445,417	8,515,767	4
6	Mexico	Mexico City	130,262,216	1,964,375	1
7	Russia	Moscow	145,912,025	17,098,242	5

Continents Table:

Continent ID	Continent Name
1	North America
2	Asia
3	Oceania
4	South America
5	Europe

Languages Table:

Language ID	Language Name
1	English
2	French
3	Mandarin
4	Portuguese
5	Spanish
6	Russian

Country_Languages Table:

Country ID	Language ID
1	1
2	1
2	2
3	3
4	1
5	4
6	5
7	6

Summary

By applying normalization, we’ve split the original table into multiple related tables:

Countries table contains country-specific data and references to the continent
Continents table stores continent names
Languages table stores language names
Country_Languages table acts as a junction table to establish a many-to-many relationship between countries and languages

Example songs

Unnormalized Data

Artist Name	Genre	Album Title	Release Date	Song Title	Duration	Playlist Title	Description
Taylor Swift	Pop	Fearless	11/11/2008	Love Story	3:56	Party Mix	Upbeat dance songs
Taylor Swift	Pop	Fearless	11/11/2008	You Belong With Me	3:51	Party Mix	Upbeat dance songs
Taylor Swift	Pop	Red	22/10/2012	Shake It Off	3:39	Party Mix	Upbeat dance songs
Bruno Mars	R&B	24K Magic	18/11/2016	24K Magic	3:47	Party Mix, Workout	Upbeat dance songs
Coldplay	Alternative	A Head Full of Dreams	4/12/2015	Viva la Vida	4:02	Party Mix, Workout	Upbeat dance songs
Taylor Swift	Pop	Fearless	11/11/2008	Love Story	3:56	Acoustic	Calming acoustic songs
Coldplay	Alternative	A Head Full of Dreams	4/12/2015	Viva la Vida	4:02	Acoustic	Calming acoustic songs
Taylor Swift	Pop	Red	22/10/2012	Love Story	3:56	Workout	High energy workout songs
Taylor Swift	Pop	Red	22/10/2012	Shake It Off	3:39	Workout	High energy workout songs

In this table:

There is issue with atomicity in Playlist Title column where we have multiple values
There are repeated values for Artist Name, Genre, Album Title, Release Date, and Playlist Title
There is a lot of redundancy that can be removed by normalization

First Normal Form (1NF)

The current table has an issue with atomicity. The Playlist Title column has multiple values (Party Mix, Workout). This violates the requirement that each cell should have a single value.

Normalization to 1NF

We need to split the rows where Playlist Title contains multiple values into separate rows.

Artist Name	Genre	Album Title	Release Date	Song Title	Duration	Playlist Title	Description
Taylor Swift	Pop	Fearless	11/11/2008	Love Story	3:56	Party Mix	Upbeat dance songs
Taylor Swift	Pop	Fearless	11/11/2008	You Belong With Me	3:51	Party Mix	Upbeat dance songs
Taylor Swift	Pop	Red	22/10/2012	Shake It Off	3:39	Party Mix	Upbeat dance songs
Bruno Mars	R&B	24K Magic	18/11/2016	24K Magic	3:47	Party Mix	Upbeat dance songs
Bruno Mars	R&B	24K Magic	18/11/2016	24K Magic	3:47	Workout	Upbeat dance songs
Coldplay	Alternative	A Head Full of Dreams	4/12/2015	Viva la Vida	4:02	Party Mix	Upbeat dance songs
Coldplay	Alternative	A Head Full of Dreams	4/12/2015	Viva la Vida	4:02	Workout	Upbeat dance songs
Taylor Swift	Pop	Fearless	11/11/2008	Love Story	3:56	Acoustic	Calming acoustic songs
Coldplay	Alternative	A Head Full of Dreams	4/12/2015	Viva la Vida	4:02	Acoustic	Calming acoustic songs
Taylor Swift	Pop	Red	22/10/2012	Love Story	3:56	Workout	High energy workout songs
Taylor Swift	Pop	Red	22/10/2012	Shake It Off	3:39	Workout	High energy workout songs

Second Normal Form (2NF)

Genre, Release Date, and Duration are dependent only on part of the primary key (Artist Name, Album Title, Song Title), but not on Playlist Title. This violates 2NF.

Normalization to 2NF

We create separate tables to remove partial dependencies.

Artists Table:

Artist ID	Artist Name	Genre
101	Taylor Swift	Pop
102	Bruno Mars	R&B
103	Coldplay	Alternative

Albums Table:

Album ID	Album Title	Release Date	Artist ID
1	Fearless	11/11/2008	101
2	Red	22/10/2012	101
3	24K Magic	18/11/2016	102
4	A Head Full of Dreams	4/12/2015	103

Songs Table:

Song ID	Song Title	Artist ID	Album ID	Duration
1	Love Story	101	1	3:56
2	You Belong With Me	101	1	3:51
3	Shake It Off	101	2	3:39
4	24K Magic	102	3	3:47
5	Viva la Vida	103	4	4:02

Playlists Table:

Playlist ID	Playlist Title	Description
1	Party Mix	Upbeat dance songs
2	Acoustic	Calming acoustic songs
3	Workout	High energy workout songs

Playlist-Song Table:

Playlist ID	Song ID
1	1
1	2
1	3
1	4
1	5
2	1
2	5
3	3
3	4

Third Normal Form (3NF)

The tables are already in 3NF with the above changes.

Summary

The provided tables (Artists, Albums, Songs, Playlists, Playlist-Song Association) are in Third Normal Form (3NF). This ensures that:

Each table has a primary key
Non-key attributes are fully functionally dependent on the entire primary key
There are no transitive dependencies

Published by

Denis

Database Normalization

Database normalization

Introduction

Example customers

Unnormalized Data

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Normalized tables

Summary

Example Employee timecards

Unnormalized Data

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Normalized tables

Summary

Example countries

Unnormalized Data

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Summary

Example songs

Unnormalized Data

First Normal Form (1NF)

Normalization to 1NF

Second Normal Form (2NF)

Normalization to 2NF

Third Normal Form (3NF)

Summary

By Denis

You Missed

Microsoft Applied Skills

Benchmarking Query Performance with .NET 8

Short-Circuiting with .NET C# example

ASP.NET Core web app that consumes an API

Quick contact info

Database Normalization

Database normalization

Introduction

Example customers

Unnormalized Data

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Normalized tables

Summary

Example Employee timecards

Unnormalized Data

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Normalized tables

Summary

Example countries

Unnormalized Data

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Summary

Example songs

Unnormalized Data

First Normal Form (1NF)

Normalization to 1NF

Second Normal Form (2NF)

Normalization to 2NF

Third Normal Form (3NF)

Summary

Check related posts

By Denis

Related Post

Benchmarking Query Performance with .NET 8

Short-Circuiting with .NET C# example

You Missed

Microsoft Applied Skills

Benchmarking Query Performance with .NET 8

Short-Circuiting with .NET C# example

ASP.NET Core web app that consumes an API