Collation in SQL
Posted by Pradeep Iyer on Thursday, June 11th, 2009Collation defines how data is sorted and compared in SQL. It is a set of rules that defines case-sensitivity, accent sensitivity, width-sensitivity etc. The most common use of collation is in case-sensitive search using SQL query. Let’s see how it proves to be useful with case-sensitivity.
Normally, it is impossible to perform a case-sensitive search in SQL. For example, see the following queries:
SELECT * FROM myTable WHERE myCompany=’QBurst’
SELECT * FROM myTable WHERE myCompany=’qburst’
Both queries will produce the same result when they are executed. But there are lots of cases where we want to differentiate the result based on search string case.
The necessity for this type of search has greatly increased, so Microsoft introduced a feature known as “Collation”. With collation, we can redefine the search scenario at the database level (rewriting the default collation of SQL server set during its installation). For the previous example, if we want to get results for “QBurst” only if we search with string “QBurst” and with no other case combination, we should write the query like:
SELECT * FROM myTable WHERE myCompany=’QBurst’ COLLATE SQL_Latin1_General_Cp437_CS_AS
Here the only thing you may not be familiar with will be “SQL_Latin1_General_Cp437_CS_AS “. This string is called Collation Name. Microsoft defines different combinations of collation names, with the combinations produced by using various supporting languages for accent-sensitivity and also with case-sensitive/case-insensitive options. You can have a detailed look on more collation names here:
http://msdn.microsoft.com/en-us/library/ms144250(SQL.90).aspx
http://msdn.microsoft.com/en-us/library/ms180175.aspx
The collation can also be applied in the parent levels such as when creating databases. The statement, “CREATE DATABASE myCollationDatabase COLLATE Latin1_General_Cp437_CS_AS”, makes the whole database arranged according to the collation specified.
The same can also be extended to applying different collations for different columns in a table. The code below shows an example:
CREATE TABLE myCollationTable
{
userName VARCHAR(20) COLLATE SQL_Latin1_General_Cp437_CI_AI,
userOccupation VARCHAR(30) COLLATE SQL_Latin1_General_Cp1_CI_AI,
userCompany VARCHAR(30) COLLATE SQL_Latin1_General_Cp1_CS_AS
}
Collation can be very efficient if used in a proper way. No major limitations have been reported yet for this feature. Now, you can try it for yourself.
Happy coding!
Tags: case-sensitivity, collation, sql