Back to Localisation Overview.
Localisation and Internationalisation (L10N and I18N) are two concepts that often get lumped together as simply “localisation” (and even I’m guilty of that), but they are in fact two distinct things, and UE4 handles them in different ways.
The localisation system is UE4 is all home grown and centered around our ‘text’ type, whereas our internationalisation support makes use of the International Components for Unicode (ICU) library. While they are separate, in UE4 you cannot have localisation at runtime without the appropriate internationalisation support.
What is text?
Text in UE4 can be thought of as the primitive component for localisation. It is in essence a very specialised string, represented by the
FText type in C++, and should be used whenever you have user-facing text that needs to be localised. If you take nothing else from this, remember that you use
FText for all user-facing text.
FText is implemented as a
TSharedRef to an
ITextData; this makes them very cheap to copy (unlike
FTextSnapshot utility provides an efficient way to detect if a cached
FText value has changed; this type is used to great effect in Slate to avoid the costly process of recreating a text layout until the user-facing text (which may be bound to a delegate) actually changes.
The data held within an
FText varies depending upon how the
FText was created. This variance is handled by the internal “text history” (
FTextHistory, a slightly odd name that has nothing to do with translation history, but instead tells the text where it came from and how it can be rebuilt if needed). Text histories support the culture-correct rebuilding of text, and form the key component to allowing live culture switching, sending
FText over the network, and the creation of culture invariant sources.
FText takes a lot of the pain of localisation away from you, there are still some ‘gotcha’s’ with text formatting that you should be aware of.
- When injecting a number that affects the sentence, handle these variances via “Plural Forms” rather than branching in code.
- This allows the sentence to be correctly translated for languages that don’t share the plural rules of your source language.
- When injecting a personal noun, make sure you include an argument for the gender of the person.
- This is important for languages with grammatical gender as it allows your translators to switch their translation based on the gender (see “Gender Forms”).
- Avoid injecting non-personal nouns (or be prepared to do a lot of work to make them localisable).
- Unlike personal nouns (which have a fixed gender between languages), non-personal nouns may have different genders in different languages. This makes the format pattern string impossible to accurately localise without a lot of per-culture meta-data (of which gender is only one part).
- This per-culture meta-data would be used to build up the correct formatting pattern to use when formatting, and would be custom for your particular problem domain. I’m not aware of any text formatting system that handles this out-of-the-box, so you’d have to implement this part yourself.
- Ideally you should prefer to provide full sentences rather than inject non-personal nouns (even if it’s redundant in your source language), as this will ensure that you get accurate translations out-of-the-box.
- Avoid concatenating partial sentences.
- This has much the same pitfalls as injecting non-personal nouns. Each part of the sentence could be localised, but the combination of them may be incorrect.
- As before, you should prefer to provide full sentences to ensure that you get accurate translations out-of-the-box.
There’s also a couple of ‘gotcha’s’ that you should be aware of with
FText ultimately stores its display string as an
FStringin UE4 is an array of
TCHARis, by default,
wchar_tis not a fixed size between platforms.
- On Microsoft platforms
TCHARis 2-bytes (for UTF-16).
- On all other currently supported platforms
TCHARis 4-bytes (for UTF-32).
- On Microsoft platforms
FStringassumes that a
TCHARalways contains a complete character, and that strings can be split at any
- There are two minor exceptions to this; ICU (which always stores and processes strings as UTF-16), and complex text shaping (which uses ICU internally to iterate the string).
- This means that the UTF-16 support on Microsoft platforms is essentially only UCS-2 since we can’t guarantee that characters outside the Basic Multilingual Plane (BMP) will be handled correctly, therefore you should limit your text to characters within the BMP.
What is ICU?
ICU is a mature and robust internationalisation library (and is probably the de facto internationalisation library for C/C++).
UE4 uses it to deal with anything involving culture specific data or processing, including the following:
- Obtaining the current culture for the platform/OS.
- Handling the prioritised fallback of cultures.
- Handling the culture correct formatting of numbers† (including percentages and currency), and dates and times (including timezone data).
- Handling the culture correct plurarity of numbers (during text formatting).
- Handling Unicode compliant transformation of text (eg, ToUpper, ToLower).
- Handling Unicode compliant comparison and collation of text.
- Handling Unicode compliant boundary analysis (characters, words, and line-breaks).
- Handling Unicode compliant bi-directional (BiDi) text detection.
† We originally used
icu::DecimalFormatfor this but it was far too slow for our needs, so instead we just extract the per-culture number formatting rules from ICU and pass them to our own
FastDecimalFormatfunctions. A similar thing happened to
icu::MessageFormatwhich was replaced by our own
The culture specific data that ICU needs to function is stored outside of ICU itself, and UE4 provides some coarse sets that you can use to minimise your project size:
- English (~1.77MB)
- EFIGS - English, French, Italian, German, and Spanish (~2.38MB)
- EFIGSCJK - English, French, Italian, German, Spanish, Chinese, Japanese, and Korean (~5.99MB)
- CJK - Chinese, Japanese, and Korean (~5.16MB)
- All (~15.3MB)
Which one of these you pick depends on what languages you need to localise your game for, and that is the topic we will cover in the next post.comments powered by Disqus