What do you understand by Column Density in SQL Server?

Density in SQL Server:
Density, when used to describe the data in a column, is a measure of how often duplicate values occur in that column. Another way to think of density is as a measure of the uniqueness of the data in a column: high density –> less unique data. Density values range from 0 to 1.0.  There are different (but equivalent) ways to think of density.

Density = 1/[# of distinct values in a column]
Density = Avg. number of duplicates for a given value / Total row count

Consider a Customers table with a CountryID column and a RegionID column. Suppose that 100 different CountryID values are present in the table, and 1000 different RegionIDs are present. The density of the CountryID column would be higher than the density of the RegionID column because a typical country is less unique than a typical Region. Note that this is an overall characterization of a range of values. There may be some regions that occur more frequently than some countries — a U.S.-based company might have more customers in California than in Estonia, for example.  But CountryID still has higher density than RegionID simply because the average country is represented in more rows than the average region.

Density of the CountryID column:

Density = 1/[# of distinct values in the column]
Density = 1/100 = 0.01

Note that the density of 0.01, or 1%, corresponds to the percentage of the total rows that would be returned by a query for a single value.  In other words, an average country makes up 1% of the table.  if there were 10,000 rows in the table, there were be 100 rows (1% of 10,000) per country, on average.

Density of the RegionID column:

Density = 1/[# of distinct values in a column]
Density = 1/1000 = 0.001

The density of the CountryID column (0.01) is higher than the density of the RegionID column (0.001) because the CountryID column is less unique.

Click here for read complete msdn post

Comments

Archive

Contact Form

Send