Tue. Sep 2nd, 2025

Database normalization

If you are new to databases or have years of experience, it is key to grasp normalization basics to build well-organized and smooth-running databases. In this post we will focus only on that.

Introduction

Normalization is a method of organizing data to reduce redundancy, with the goal of creating a more efficient, reliable, and simple database. The main goals of the normalization is to organize data, remove data redundancy, ensure data integrity, ensure data consistency and optimize the database for storage and retrieval of the data.

There are six normal forms:

  • First normal form (1NF)
  • Second normal form (2NF)
  • Third Normal Form (3NF)
  • Boyce-Codd normal form (BCNF)
  • Fourth normal form (4NF)
  • Fifth normal form (5NF)

In most of the cases we are ok with the database being in the 3rd normal form. Higher level of the database normalization can lead to the slower query performance in databases where data retrieval is very important. That is one of the reasons why analytical databases are highly denormalized. Going beyond 3th normal form can also increase complexity of the database which can be bad for the onboarding of the new members and juniors as well as a bigger challenge for the current developers and database administrators.

In this post and related examples I will focus only on the first three normal forms. In future I will cover some use cases where other normal forms could be used.

1NF requires that each value in a column is atomic and that all rows have the same number of fields. This means that we should not store for example city, country and address in a single column because that complicates tasks like sorting records by city. If we want to store mentioned data in a single column then a relational database like MSSQL is not a suitable option and we can use a NOSQL columnar database like Casandra.

2NF requires 1NF to be valid, all rows are identified separately and completely dependent on the primary key and there are no repeating columns. Example for that would be a table which contains course id and name columns and instructor id and name columns. In this case the instructorName column would not depend on courseId which would violate 2NF and we would need to split the table into Course table, Instructor table and junction table if the instructor can teach multiple courses.

3NF requires 2NF to be satisfied and there are no transitive dependencies which means that no non-key attribute is dependent on another non-key attribute. This means that all non-required key columns must be independent of each other. Example of this would be if we add to table from 2NF example additional column Department name and department description which is recurring and which needs to be moved to separate table.

You can read more about Normalization on this links:

Example customers

Unnormalized Data

Customer IDFirst NameLast NameEmailPhone NumberAddressCity
1JohnDoejohndoe@gmail.com555-1234123 Main StAnytown
2JaneSmithjanesmith@gmail.com555-5678456 Elm StAnytown
3BobJohnsonbobjohnson@gmail.com555-8901789 Oak StOthertown
4SueLeesuelee@gmail.com555-2345321 Maple StOthertown

In this table:

  • The City column has repeated values like “Anytown” and “Othertown”
  • The Address column is tied directly to each customer leading to some redundancy if multiple customers live in the same city or even at the same address

First Normal Form (1NF)

To achieve 1NF, we need to ensure that each column contains atomic (indivisible) values, and each record is unique. The table is already in 1NF because each field contains a single value per record and there are no repeating columns.

Second Normal Form (2NF)

Since the table is 1NF, there are no partial dependencies and all non-key attributes depend on the primary key, the table is already in 2NF.

Third Normal Form (3NF)

To achieve 3NF we will create two new tables:

  • Addresses table where we are going to move addresses and reference the city they belong to
  • Cities table where we will move city names
  • (Note) in this example customer belongs to only single address. We are avoiding junction table in this example and it will be covered in next examples. In this example this case would be covered by another table called “junction table” which would contain CustomerID and AddressID giving us many to many relationship between those tables

Normalized tables

Customers Table:

Customer IDFirst NameLast NameEmailPhone NumberAddress ID
1JohnDoejohndoe@gmail.com555-12341
2JaneSmithjanesmith@gmail.com555-56782
3BobJohnsonbobjohnson@gmail.com555-89013
4SueLeesuelee@gmail.com555-23454

Addresses Table:

Address IDAddressCity ID
1123 Main St1
2456 Elm St1
3789 Oak St2
4321 Maple St2

Cities Table:

City IDCity Name
1Anytown
2Othertown

This structure eliminates redundancy by storing city names only once in the cities table and addresses only once in the addresses table. Each customer now references their address using the Address ID.

Summary

  • Customers table contains customer-specific data, including the first name, last name, email, phone number, and a reference to the address
  • Addresses table stores address details and references the city using City ID
  • Cities table stores city names

Example Employee timecards

Unnormalized Data

Employee IDEmployee NamePay RateWeek EndingTuesdayWednesdayThursdayFridaySaturdaySundayTotal Hours
101John Smith$15/hr26/2/202288880040
102Jane Doe$20/hr26/2/202288880040
103John Smith$15/hr5/3/202288800032

In this table:

  • Employee details (like name and pay rate) are repeated for each timecard
  • Days of the week are columns rather than rows, which leads to a less flexible design

First Normal Form (1NF)

To achieve 1NF, we ensure that each column contains atomic values and there are no repeating groups of data. In this case days of week are not considered repeating groups because they each represent a distinct day of the week. Just because by definition it is not repeating groups it does not mean it is good. Example of repeating columns would be if we had for example Phone1, Phone2, Phone3 columns. The table is in 1NF.

Second Normal Form (2NF)

The table is 1NF but we have partial dependencies so it does not satisfy 2NF.

Third Normal Form (3NF)

Even if the table were to be brought to 2NF, the total hours column would still be problematic because it can be derived from the sum of the daily hours (Tuesday, Wednesday, etc.). This creates a transitive dependency where total hours depends on the daily hours which depends on the primary key.

Normalized tables

Employees Table:

Employee IDEmployee NamePay Rate
101John Smith$15/hr
102Jane Doe$20/hr
103Joe Brown$18/hr

Timecards Table:

Timecard IDEmployee IDWeek Ending
110126/2/2022
210226/2/2022
310326/2/2022
41015/3/2022

Days of Week Table:

Day of Week IDName
1Monday
2Tuesday
3Wednesday
4Thursday
5Friday
6Saturday
7Sunday

Timecard Entries Table:

Timecard IDDay of Week IDHours
118
128
138
148
158
218
228
238
248
258
318
328
338
348
428
438
448
458

Summary

To bring this table to 2NF and 3NF:

  • 2NF removes partial dependencies by splitting the table into separate tables such as an Employees table for Employee Name and Pay Rate and a Timecards table for Week Ending and the daily hours
  • 3NF removes transitive dependencies by eliminating the Total Hours column relying instead on calculating totals dynamically based on the daily hours stored in the Timecards table
  • Employees table contains employee-specific data including the name and pay rate
  • Timecards table contains timecard details linking each timecard to an employee and a specific week
  • Days of week table stores day names and their corresponding IDs
  • Timecard entries table records the number of hours worked for each day of the week for each timecard

Example countries

Unnormalized Data

Country NameCapital CityPopulationContinentArea (sq km)Language
USAWashington D.C.331,449,281North America9,147,593English
CanadaOttawa38,048,738North America9,984,670English, French
ChinaBeijing1,402,193,353Asia9,596,961Mandarin
AustraliaCanberra25,762,395Oceania7,692,024English
BrazilBrasília213,445,417South America8,515,767Portuguese
MexicoMexico City130,262,216North America1,964,375Spanish
RussiaMoscow145,912,025Europe17,098,242Russian

First Normal Form (1NF)

In this example in Language column we have multiple values and because of that table is not in 1NF

We can transform it to 1NF by splitting English,French into separate rows:

Country NameCapital CityPopulationContinentArea (sq km)Language
USAWashington D.C.331,449,281North America9,147,593English
CanadaOttawa38,048,738North America9,984,670English
CanadaOttawa38,048,738North America9,984,670French
ChinaBeijing1,402,193,353Asia9,596,961Mandarin
AustraliaCanberra25,762,395Oceania7,692,024English
BrazilBrasília213,445,417South America8,515,767Portuguese
MexicoMexico City130,262,216North America1,964,375Spanish
RussiaMoscow145,912,025Europe17,098,242Russian

Second Normal Form (2NF)

To move table to 2NF we need to create primary key on table and move the language column into separate table:

Countries Table:

Country IDCountry NameCapital CityPopulationContinentArea (sq km)
1USAWashington D.C.331,449,281North America9,147,593
2CanadaOttawa38,048,738North America9,984,670
3ChinaBeijing1,402,193,353Asia9,596,961
4AustraliaCanberra25,762,395Oceania7,692,024
5BrazilBrasília213,445,417South America8,515,767
6MexicoMexico City130,262,216North America1,964,375
7RussiaMoscow145,912,025Europe17,098,242

Country_Languages Table:

Country IDLanguage
1English
2English
2French
3Mandarin
4English
5Portuguese
6Spanish
7Russian

Third Normal Form (3NF)

We have now table in 2NF. To improve it even further we are going to pull continents to new table and do some changes on current tables:

Countries Table:

Country IDCountry NameCapital CityPopulationArea (sq km)Continent ID
1USAWashington D.C.331,449,2819,147,5931
2CanadaOttawa38,048,7389,984,6701
3ChinaBeijing1,402,193,3539,596,9612
4AustraliaCanberra25,762,3957,692,0243
5BrazilBrasília213,445,4178,515,7674
6MexicoMexico City130,262,2161,964,3751
7RussiaMoscow145,912,02517,098,2425

Continents Table:

Continent IDContinent Name
1North America
2Asia
3Oceania
4South America
5Europe

Languages Table:

Language IDLanguage Name
1English
2French
3Mandarin
4Portuguese
5Spanish
6Russian

Country_Languages Table:

Country IDLanguage ID
11
21
22
33
41
54
65
76

Summary

By applying normalization, we’ve split the original table into multiple related tables:

  • Countries table contains country-specific data and references to the continent
  • Continents table stores continent names
  • Languages table stores language names
  • Country_Languages table acts as a junction table to establish a many-to-many relationship between countries and languages

Example songs

Unnormalized Data

Artist NameGenreAlbum TitleRelease DateSong TitleDurationPlaylist TitleDescription
Taylor SwiftPopFearless11/11/2008Love Story3:56Party MixUpbeat dance songs
Taylor SwiftPopFearless11/11/2008You Belong With Me3:51Party MixUpbeat dance songs
Taylor SwiftPopRed22/10/2012Shake It Off3:39Party MixUpbeat dance songs
Bruno MarsR&B24K Magic18/11/201624K Magic3:47Party Mix, WorkoutUpbeat dance songs
ColdplayAlternativeA Head Full of Dreams4/12/2015Viva la Vida4:02Party Mix, WorkoutUpbeat dance songs
Taylor SwiftPopFearless11/11/2008Love Story3:56AcousticCalming acoustic songs
ColdplayAlternativeA Head Full of Dreams4/12/2015Viva la Vida4:02AcousticCalming acoustic songs
Taylor SwiftPopRed22/10/2012Love Story3:56WorkoutHigh energy workout songs
Taylor SwiftPopRed22/10/2012Shake It Off3:39WorkoutHigh energy workout songs

In this table:

  • There is issue with atomicity in Playlist Title column where we have multiple values
  • There are repeated values for Artist Name, Genre, Album Title, Release Date, and Playlist Title
  • There is a lot of redundancy that can be removed by normalization

First Normal Form (1NF)

The current table has an issue with atomicity. The Playlist Title column has multiple values (Party Mix, Workout). This violates the requirement that each cell should have a single value.

Normalization to 1NF

We need to split the rows where Playlist Title contains multiple values into separate rows.

Artist NameGenreAlbum TitleRelease DateSong TitleDurationPlaylist TitleDescription
Taylor SwiftPopFearless11/11/2008Love Story3:56Party MixUpbeat dance songs
Taylor SwiftPopFearless11/11/2008You Belong With Me3:51Party MixUpbeat dance songs
Taylor SwiftPopRed22/10/2012Shake It Off3:39Party MixUpbeat dance songs
Bruno MarsR&B24K Magic18/11/201624K Magic3:47Party MixUpbeat dance songs
Bruno MarsR&B24K Magic18/11/201624K Magic3:47WorkoutUpbeat dance songs
ColdplayAlternativeA Head Full of Dreams4/12/2015Viva la Vida4:02Party MixUpbeat dance songs
ColdplayAlternativeA Head Full of Dreams4/12/2015Viva la Vida4:02WorkoutUpbeat dance songs
Taylor SwiftPopFearless11/11/2008Love Story3:56AcousticCalming acoustic songs
ColdplayAlternativeA Head Full of Dreams4/12/2015Viva la Vida4:02AcousticCalming acoustic songs
Taylor SwiftPopRed22/10/2012Love Story3:56WorkoutHigh energy workout songs
Taylor SwiftPopRed22/10/2012Shake It Off3:39WorkoutHigh energy workout songs

Second Normal Form (2NF)

Genre, Release Date, and Duration are dependent only on part of the primary key (Artist Name, Album Title, Song Title), but not on Playlist Title. This violates 2NF.

Normalization to 2NF

We create separate tables to remove partial dependencies.

Artists Table:

Artist IDArtist NameGenre
101Taylor SwiftPop
102Bruno MarsR&B
103ColdplayAlternative

Albums Table:

Album IDAlbum TitleRelease DateArtist ID
1Fearless11/11/2008101
2Red22/10/2012101
324K Magic18/11/2016102
4A Head Full of Dreams4/12/2015103

Songs Table:

Song IDSong TitleArtist IDAlbum IDDuration
1Love Story10113:56
2You Belong With Me10113:51
3Shake It Off10123:39
424K Magic10233:47
5Viva la Vida10344:02

Playlists Table:

Playlist IDPlaylist TitleDescription
1Party MixUpbeat dance songs
2AcousticCalming acoustic songs
3WorkoutHigh energy workout songs

Playlist-Song Table:

Playlist IDSong ID
11
12
13
14
15
21
25
33
34

Third Normal Form (3NF)

The tables are already in 3NF with the above changes.

Summary

The provided tables (Artists, Albums, Songs, Playlists, Playlist-Song Association) are in Third Normal Form (3NF). This ensures that:

  • Each table has a primary key
  • Non-key attributes are fully functionally dependent on the entire primary key
  • There are no transitive dependencies

By Denis