UNION operator is not eliminating duplicate data. When ? and Why ?


Normally, UNION operator will elimate the duplicate data from the result sets.

But, In some situation, It'll not eliminate the duplicate data.

When ?
Incase, If we have used any Case-Sensitive collations on columns, database, server.

Why ?
The UNION operation depend on the collation what we have used in columns, Database, Server.

Scenario...
/*One table created, column with case-sensitive(CS) collation*/
If OBJECT_ID('Tb_Sample1') Is Null
Create Table Tb_Sample1
(
Id Int Identity(1,1),
Column1 Varchar(10) Collate SQL_Latin1_General_CP1_CS_AS,
Column2 Varchar(10)
)
Go


/*Inserting Duplicate data, But the case differs*/
Insert Tb_Sample1(Column1, Column2) Values('SQL','Server')
Insert Tb_Sample1(Column1, Column2) Values('MS','Access')
Insert Tb_Sample1(Column1, Column2) Values('Adobe','Reader')
Insert Tb_Sample1(Column1, Column2) Values('sql','Server')
Go


/*Fetching data using UNION operator*/
Select Column1 , Column2 From Tb_Sample1 where Column1 in('SQL', 'MS','sql')
Union
Select Column1 , Column2 From Tb_Sample1 where Column1 in('Adobe')


/*Expected result was...*/
Column1Column2
AdobeReader
MSAccess
sqlServer


/*But, The actual result is...*/
Column1Column2
AdobeReader
MSAccess
SQLServer
sqlServer

The UNION operator compares the data depend on Collation what we have used in Columns, Database, Server. So, It could not be considered as a duplicate and eliminate it.

How to fix it ?
- The collation has to be changed to Case-Insensitive as given below
Alter Table Tb_Sample1 Alter Column Column1 Varchar(10) Collate SQL_Latin1_General_CP1_CI_AS
Go


or

- Use COLLATE statement and force the case-insensitive collation when using the UNION operator as given below
Select Column1 Collate SQL_Latin1_General_CP1_CI_AS as Column1, Column2 From Tb_Sample1 where Column1 in('SQL', 'MS','sql')
UNION
Select Column1 , Column2 From Tb_Sample1 where Column1 in('Adobe')


Important:
- UNION operator automatically applies DISTINCT SORT operator internally.
- So, The duplicate data will be eliminated and SORTED from the final result set. But, UNION ALL will NOT work depend on Collation and not do any data Sorting.

No comments:

Post a Comment