I've spent a lot of time this evening trying to find guidance about which choice of collation to apply in my SQL Server 2008 R2 installation, but almost everything online basically says "choose what is right for you." Extremely unhelpful.
My context is new application development. I am not worrying about backward compatibility with a prior version of SQL Server (viz. <= 2005). I am very interested in storing data representing languages from around the globe - not just Latin based. What very little help I've found online suggests I should avoid all "SQL_" collations. This narrows my choice to using either a binary or "not binary" collation based on the Windows locale.
If I use binary, I gather I should use "BIN2." So this is my question. How do I determine whether I should use BIN2 or just "Latin1_General_100_XX_XX_XX"? My spider-sense tells me that BIN2 will provide collation that is "less accurate," but more generic for all languages (and fast!). I also suspect the binary collation is case sensitive, accent sensitive, and kana-sensitive (yes?). In contrast, I suspect the non-binary collation would work best for Latin-based languages.
The documentation doesn't support my claims above, I'm making educated guesses. But this is the problem! Why is the online documentation so thin that the choice is left to guesswork? Even the book "SQL Server 2008 Internals" discussed the variety of choices, without explaining why and when binary collation would be chosen (compared with non-binary windows collation). Criminy!!!