Each engineer defining a new string column decides: Do I use
nvarchar or do I use varchar?
Since I discovered
nvarchar, I’ve always use
nvarchar. My thought is, why use a datatype that may not support a text value, and you won’t likely discover an incompatibility value until it’s in production.
I hear the argument about space, but space is cheap and not worth worrying about. I know what you’re thinking, the cost doesn’t matter when the hard drive is full, and I agree.
Starting with Sql Server 2008 R2 data compression is applied to
nvarchar(max) is excluded) fields. Depending on the data the effectiveness of the compression varies, but with English, there is a 50% compression, which puts it on par with the
varchar’s space needs (1).
Something else to consider is most programming languages support UTF-16 as the string type. So each time a
varchar is loaded from the database, it’s converted to UTF-16 (
This StackOverflow answer sums up
An nvarchar column can store any Unicode data. A varchar column is restricted to an 8-bit codepage. Some people think that varchar should be used because it takes up less space. I believe this is not the correct answer. Codepage incompatabilities are a pain, and Unicode is the cure for codepage problems. With cheap disk and memory nowadays, there is really no reason to waste time mucking around with code pages anymore.
All modern operating systems and development platforms use Unicode internally. By using nvarchar rather than varchar, you can avoid doing encoding conversions every time you read from or write to the database. Conversions take time, and are prone to errors. And recovery from conversion errors is a non-trivial problem.
If you are interfacing with an application that uses only ASCII, I would still recommend using Unicode in the database. The OS and database collation algorithms will work better with Unicode. Unicode avoids conversion problems when interfacing with other systems. And you will be preparing for the future. And you can always validate that your data is restricted to 7-bit ASCII for whatever legacy system you’re having to maintain, even while enjoying some of the benefits of full Unicode storage. (2)
My conclusion is the only time the data is a varchar is when it’s at rest.