Sharing notes from my ongoing learning journey — what I build, break and understand along the way.
Database Normalization in SQL Explained: 1NF, 2NF, 3NF with Examples
Understanding Database Normalization in SQL (1NF, 2NF, 3NF)
If you’re working with relational databases like MySQL and other modern SQL database systems, you’ve probably heard about normalization…
But what exactly does it mean — and why does it matter so much for clean, efficient SQL design?
Let’s break it down step by step — from the 1st Normal Form (1NF) to the 3rd Normal Form (3NF) — with fresh examples and SQL code you can actually use.
What Is Database Normalization?
Database normalization is the process of structuring a relational database to reduce data redundancy and improve data integrity.
In simple terms:
“Keep every piece of information in one logical place — no unnecessary repetition.”
Goals of normalization:
- Eliminate redundant data (no repeated customer names, cities, etc.)
- Ensure data consistency across tables
- Simplify maintenance and updates
- Improve query reliability and performance over time
Why Normalization Is So Important in SQL
When you skip normalization, problems pile up fast:
1. Data duplication
Repeating the same customer or product info across rows wastes disk space and slows down indexing, joins, and backups.
2. Update anomalies
If you store the same data in multiple places, every change becomes risky — update one copy and forget the others, and your data becomes inconsistent.
3. Insert / Delete issues
Without normalization:
- You might have to enter irrelevant fields just to add one record (insert anomaly)
- You could accidentally delete related data when removing a single row (delete anomaly)
4. Poor scalability
A flat, unstructured database might work for 100 records — but not for 10 million.
5. Balanced performance
A well-normalized design is the foundation of good performance. Later, you can selectively denormalize for reporting or caching — but start clean.
Key Concepts
| Term | Meaning |
|---|---|
| Table | Collection of related data organized in rows and columns |
| Row (Record) | One instance — e.g., a single customer or order |
| Column (Field) | One attribute — e.g., name, price, or email |
| Primary Key | Unique identifier for each row |
| Foreign Key | A reference to another table’s primary key |
First Normal Form (1NF)
Rule: Each column must contain atomic (single) values. No lists, no commas, no multiple values in one cell.
Bad Example
UserID | FullName | Email | Phones
------ | ------------- | ----------------- | ----------------------------
1 | Alice Brown | alice@email.com | 555-1234, 555-5678
2 | Bob Johnson | bob@email.com | 555-9999
Here, the Phones column breaks the rule — it holds multiple values.
This violates the 1st Normal Form (as also shown in your original notes: “Each field must contain only one value”).
Fix: Split into a separate table
CREATE TABLE Users (
UserID INT PRIMARY KEY,
FullName VARCHAR(100),
Email VARCHAR(100)
);
CREATE TABLE UserPhones (
PhoneID INT PRIMARY KEY,
UserID INT,
Phone VARCHAR(20),
FOREIGN KEY (UserID) REFERENCES Users(UserID)
);
Now each phone number is a separate record.
Queries like “Find all phone numbers of Alice” become simple joins instead of string searches.
SELECT u.FullName, p.Phone
FROM Users u
JOIN UserPhones p ON u.UserID = p.UserID;
Result: Data is atomic, searchable, and consistent — fully compliant with 1NF.
Second Normal Form (2NF)
Rule:
The table must first satisfy 1NF.
If the table has a composite primary key (two or more columns), then no non-key column should depend on only part of that key.
In other words: every non-key field must depend on the entire primary key, not just part of it.
Bad Example
StudentID | StudentName | CourseID | CourseName | Grade
--------- | ------------ | -------- | ------------------ | -----
1 | Alice | 101 | Database Systems | 90
1 | Alice | 102 | Web Programming | 85
2 | Bob | 101 | Database Systems | 88
Here, (StudentID, CourseID) is the composite key.
But StudentName depends only on StudentID, and CourseName depends only on CourseID.
This violates 2NF.
Fix: Break into three tables
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
StudentName VARCHAR(100)
);
CREATE TABLE Courses (
CourseID INT PRIMARY KEY,
CourseName VARCHAR(100)
);
CREATE TABLE Enrollments (
StudentID INT,
CourseID INT,
Grade INT,
PRIMARY KEY (StudentID, CourseID),
FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);
Result:
- Each non-key field depends on the entire composite key.
- Data redundancy is eliminated.
- Updates are easy and safe.
Third Normal Form (3NF)
Rule:
The table must be in 2NF, and no non-key column should depend on another non-key column — i.e., no transitive dependency.
Bad Example
CustomerID | Name | ZipCode | City | Country
----------- | ----------- | -------- | ---------- | ---------
1 | Alice Brown | 10001 | New York | USA
2 | Bob Smith | 94105 | San Francisco | USA
City and Country depend on ZipCode, not directly on CustomerID.
So there’s a transitive dependency:CustomerID → ZipCode → City, Country.
Fix: Split address data into its own table
CREATE TABLE ZipCodes (
ZipCode VARCHAR(10) PRIMARY KEY,
City VARCHAR(50),
Country VARCHAR(50)
);
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
Name VARCHAR(100),
ZipCode VARCHAR(10),
FOREIGN KEY (ZipCode) REFERENCES ZipCodes(ZipCode)
);
Result:
- Each field now depends directly on the key, not indirectly through another column.
- The schema is now in 3rd Normal Form — the most common target for production systems.
Practical Example: E-Commerce Schema
Let’s apply normalization to a real scenario — an online store.
Step 1 – Start with raw data (not normalized)
OrderID | CustomerName | ProductName | Category | Price | Quantity | Date
Problems:
- Customer and product names repeat everywhere.
- Category name repeats for each product.
Step 2 – Normalize progressively
1NF: Make all fields atomic
Split phone numbers, category lists, etc. into separate records.
2NF: Handle composite keys
For example, the OrderDetails table uses (OrderID, ProductID) as its primary key — all non-key columns depend on both.
3NF: Remove transitive dependencies
Move CategoryName to a Categories table.
Move address details to separate lookup tables if needed.
Step 3 – Final SQL Schema
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
FullName VARCHAR(100),
Email VARCHAR(100)
);
CREATE TABLE Categories (
CategoryID INT PRIMARY KEY,
CategoryName VARCHAR(100)
);
CREATE TABLE Products (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(100),
Price DECIMAL(10,2),
CategoryID INT,
FOREIGN KEY (CategoryID) REFERENCES Categories(CategoryID)
);
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATETIME,
FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);
CREATE TABLE OrderDetails (
OrderID INT,
ProductID INT,
Quantity INT,
UnitPrice DECIMAL(10,2),
PRIMARY KEY (OrderID, ProductID),
FOREIGN KEY (OrderID) REFERENCES Orders(OrderID),
FOREIGN KEY (ProductID) REFERENCES Products(ProductID)
);
Each table focuses on one entity — clean, consistent, and easy to maintain.
Normalization vs. Performance
Normalization improves data quality and maintainability — but in analytical or reporting systems, sometimes you intentionally denormalize to reduce joins.
OLTP systems (like e-commerce, CRM, etc.):
→ Normalize for data integrity.
OLAP / reporting systems:
→ Partially denormalize (e.g., star or snowflake schema) for speed.
Best practice:
Normalize first, then selectively denormalize only if performance truly demands it.
Summary of Normal Forms
| Normal Form | Main Rule | Solves |
|---|---|---|
| 1NF | Each field holds only one value | Repeating groups, multi-valued fields |
| 2NF | Non-key columns depend on the full composite key | Partial dependencies |
| 3NF | Non-key columns don’t depend on other non-key columns | Transitive dependencies |
In Short
Database normalization is not just an academic concept — it’s what keeps your SQL systems scalable, consistent, and easy to work with.
Start normalized, ensure data quality and flexibility, and later, if you must, denormalize for performance.
That’s the modern, pragmatic approach to relational database design.
