In SQL, cardinality refers to the uniqueness of data in a specific column of a table. A table would be said to have less cardinality if it has more duplicated data in a column. So, more the cardinality less the data duplication (in a column) of SQL database table. In databases, the term data cardinality is used. It is related to the performance of a query. It uncovers how many unique values are in an attribute.
A cardinality estimation framework is used which includes base statistics that estimate the foreseen number of rows that would return from the primary table. The base statistics consists of three components:
Table Metadata
Column Density
Column Distribution Histogram
Histograms created either manually or automatically are used for cardinality estimation CE in SQL. Logical rewrites and constraint information can also be used by SQL server to calculate cardinality.
For instance, suppose we have a table with a "Gender" column which has just two potential values of "Men" and "Women". At that point, that "Gender" column would have a cardinality of 2 on the grounds that there are just two one of a kind values that might show up in that column – Men and Women. A column with zero cardinality means that column does not hold any unique value.
Suppose you have a table named Employees
with the following data:
EmployeeID | FirstName | LastName | Department |
---|---|---|---|
1 | Alice | Smith | HR |
2 | Bob | Johnson | IT |
3 | Carol | Williams | IT |
4 | Dave | Brown | HR |
The cardinality of the Department
column is 2, as there are two distinct departments: HR and IT.
Consider the following SQL query on the Employees
table:
SELECT Department FROM Employees;
The result set contains 4 rows, so the cardinality of the result set is 4.
Suppose you have another table named Projects
with the following data:
ProjectID | ProjectName | Department |
---|---|---|
1 | ProjectA | HR |
2 | ProjectB | IT |
3 | ProjectC | IT |
When you perform an INNER JOIN between the Employees
and Projects
tables based on the Department
column, the cardinality of the result set depends on the matching rows:
SELECT E.EmployeeID, E.FirstName, E.LastName, P.ProjectID, P.ProjectName
FROM Employees E
INNER JOIN Projects P ON E.Department = P.Department;
The result set contains 5 rows, so the cardinality of the result set is 5.
Cardinality is essential in understanding database performance and optimization, as high cardinality columns are better suited for indexing and can result in more efficient query execution plans.