I recently noticed that the character count of a text message I was drafting on my iPhone suddenly changed from “x/160” to “x/70” (here’s how to display a character count in Messages, if you didn’t already know).
It basically boils down to this: An SMS may contain up to 140 bytes (= 1120 bits) of data. UK mobile networks use the GSM standard. The basic GSM character set is encoded using 7 bits per character, which allows for a text message to consist of 1120/7 = 160 characters.
It is only possible to represent 128 different characters with 7 bits. This suffices to capture all common English characters. A few additional special characters (mostly punctuation) can be specified using the basic character set “extension”, which requires 14 bits for every character in the extended set.
However, support for the vast majority of foreign language characters comes in the form of 16-bit UTF-16 alphabet. If you have a mix of English and foreign language characters in your text message, the entire message must be sent in UTF-16, which reduces the number of available characters to 1120/16 = 70 characters. This explains the phenomenon I was experiencing.
I know what you’re thinking: this sucks for those who text in languages other than English. Thankfully, the GSM standard has a solution called “national language shift tables”. In this scheme, several 7-bit character sets are recognised, each corresponding to the most commonly used characters of a particular language. The first four bytes of the text message indicate the specific character set to use, and the rest of the message (136 bytes, or 1088 bits) can be used for the actual content of the message, allowing for a respectable compromise of 1088/7 ≅ 155 characters.
Using characters that belong to multiple shift tables in the same text triggers a fallback to UTF-16, but the idea is to capture the vast majority of text communication.
If this has piqued your interest, Wikipedia has a fairly comprehensive article about GSM 03.38.