SMS Encoding Explained

TalkBox SMS messages are encoded using either GSM-7 or UCS2. An encoding is like an alphabet, it’s a way to display a digitised message as letters and symbols that can be read on screen.

GSM-7 is the original SMS encoding used for old school text only SMS. If you were sending text messages in the nineties with a chunky Nokia you were using GSM-7. It consists of letters, numbers and a few special characters like $ and @, there are just 128 characters available in total. It’s called GSM-7 as the characters are encoded with 7 bits. This means that an 140 byte SMS message (yes, they’re only 140 bytes long) can carry 160 characters.

UCS2 encoding supports the same characters as GSM-7 and about 65 thousand more. It can display characters from pretty much every language in the world and a lot more like emojis 🎉. UCS2 characters each require a whopping 16 bits, this means that a 140 byte SMS message can carry just 70 characters.

TalkBox will automatically send SMS messages using the most efficient encoding possible. This means it uses GSM-7 unless you’ve included any non GSM-7 characters in which case it will use UCS2.

The SMS editor in TalkBox displays a notification bar whenever it detects a non GSM-7 character and switches to UCS2. When this happens you’ll notice the number of characters remaining reduces. This is important because shorter UCS2 encoded messages can easily become long messages and increase your SMS spend.

Note the difference adding a pizza slice to your SMS can make. Here’s the SMS editor with a simple GSM-7 message.

Here’s the same message with a single emoji. It’s now using UCS2 encoding and has become a long message with two parts.

If you compose an SMS messages outside of TalkBox and paste it into the TalkBox SMS editor you might accidentally include a non GSM-7 character. Many text editors like Word support non GSM-7 characters, some of these characters don’t look very interesting so you might not realise you’re composed a UCS2 SMS. TalkBox will always identify SMS content as UCS2 in the SMS editor and inform the user but some characters to watch out for are:

  • Smart quotes Some editors will change"foo" to “foo” , these angled quotes are not part of GSM-7. This will make the messages UCS2 without making the content notably better. The same thing can happen with single quotes.
  • Long dashes. If you type -- into some editors it will be changed to , a single slightly longer dash. Again this will switch the encoding to UCS2 without improving your message.

It’s safer to type your message directly into TalkBox rather than using an editor designed for writing documents or notes. If you really want to know what’s going on under the hood you can type your message into this message segment calculator.

Updated on October 16, 2023

Was this article helpful?

Related Articles

Need Support?
Can't find the answer you're looking for?
Contact Support