Thursday, 14 November 2013

Internet Explorer 11: Two Steps Forward and Two Steps Back

For years I have bemoaned the incomplete and broken implementation of script-specific font configuration in Internet Explorer. The ability to manually configure what font to use for what Unicode script is a killer feature for me, and something that in my opinion should make Internet Explorer vastly superior to Chrome, which does not allow the user to choose what font to use by default for particular Unicode scripts (in the absense of a font being explicitly specified by the page being read). For multilingual users, especially those who work with more obscure scripts and languages, I find that Internet Explorer generally provides a much better experience, with fewer annoying little boxes for unsupported characters. (I have had bad experiences with Firefox in the past, but reinstalled it for this blog post and was pleasantly surprised by its multiscript support, which is much better than I remember.)


Tag Cloud for the BabelStone Blog as viewed with Internet Explorer 10


Tag Cloud for the BabelStone Blog as viewed with Chrome 30


Tag Cloud for the BabelStone Blog as viewed with Firefox 25


IE6 through IE10 support font configuration for 37 languages or scripts. (This is a little better than Firefox 25 which allows font configuration for 32 languages or regions.)


Configurable Languages in IE10 under Windows 7
Language/Script Scripts Unicode
Version
Fonts listed
Arabic1.0(various)
Armenian1.0Arial Unicode MS
Sylfaen
Tahoma
Bengali1.0Arial Unicode MS
Shona Bangla
Vrinda
Braille3.0Segoe UI Symbol
Canadian Syllabic3.0Euphemia
Cherokee3.0Plantagenet Cherokee
Chinese SimplifiedHan1.0(various)
Chinese TraditionalHan and Bopomofo1.0(various)
Cyrillic1.0(various)
Devanagari1.0Aparajita
Arial Unicode MS
Kokila
Mangal
Utsaah
Ethiopic3.0Nyala
Georgian1.0Arial Unicode MS
Sylfaen
Greek1.0(various)
Gujarati1.0Arial Unicode MS
Shruti
Gurmukhi1.0Arial Unicode MS
Raavi
Hebrew1.0(various)
JapaneseHan and Hiragana/Katakana1.0(various)
Kannada1.0Arial Unicode MS
Tunga
Khmer3.0DaunPenh
Khmer UI
MoolBoran
KoreanHan and Hangul1.0(various)
Lao1.0Arial Unicode MS
DokChampa
Lao UI
Latin based1.0(various)
Malayalam1.0Arial Unicode MS
Kartika
Mongolian3.0 
Myanmar3.0 
Ogham3.0Segoe UI Symbol
Oriya1.0Arial Unicode MS
Kalinga
Runic3.0Segoe UI Symbol
Sinhala3.0Iskoola Pota
Syriac3.0Estrangelo Edessa
Tamil1.0Arial Unicode MS
Latha
Vijaya
Telugu1.0Arial Unicode MS
Gautami
Vani
Thaana3.0MV Boli
Thai1.0(various)
Tibetan2.0Arial Unicode MS
Microsoft Himalaya
User Defined(various)
Yi3.0Microsoft Yi Baiti

As you can see, this list does not include any languages with Unicode scripts introduced later than Unicode version 3.0, which was released in September 1999, but it does include all Unicode scripts available in Unicode 3.0 (Bopomofo is presumably subsumed within Chinese Traditional). When IE6 was released in August 2001 this list was pretty much up to date, and only lacked three scripts added in Unicode 3.1 (Deseret, Gothic and Old Italic), which had been released in March 2001, after IE6 had gone beta.

This was a great start, and suggested that IE was going to provide cutting-edge support for Unicode scripts as they were encoded. However, it seems that no-one took ownership of this feature, and it was left to languish for the next twelve years. When IE10 was released in August 2012, thirteen years after Unicode 3.0, it still only allowed font configuration for the original list of 37 languages.

At the same time as no-one was updating the font configuration feature for the 62 new scripts that were added to Unicode between 3.1 and 6.1 (released in January 2012), no-one was fixing any bugs with the the font configuration feature. As discussed in Michael Kaplan's blog post, The importance of Tagalog to Burmese, aka "Of course I'd lie to you, I'm a font!" (18 April 2008), the main bugs in the feature are due to the way that IE populates the list of fonts for each language. It lists those fonts that: a) have the appropriate Unicode Subset Bitfield bit set; and b) which also have a mapping to a sample Unicode character for the script. Unfortunately, in the case of Myanmar (Burmese), the sample character used is U+1700 ᜀ TAGALOG LETTER A, which is a character from the historic Philippine script Tagalog (Baybayin) which was encoded in Unicode 3.2. The reason for this mistake is that the list of Unicode 3.0 sample characters used by IE was based on draft code charts, and the Myanmar script was relocated from its original proposed location starting at U+1700 to a new location starting at U+1000 when it was actually encoded. This means that no Myanmar font will show up on the list of Myanmar fonts unless it redundantly includes a mapping to the Tagalog character at U+1700. In the case of Mongolian, no sample character is listed at all, which is even worse than the situation for Myanmar as no font ever passes the test for supporting Mongolian, and so although Microsoft has shipped a Mongolian font ("Microsoft Baiti") since Windows Vista, this font does not show up on the list of Mongolian fonts for IE10 and earlier.


Mongolian Font Configuration Dialog in IE10 under Windows 7

No fonts listed even though Windows 7 ships with the "Mongolian Baiti" font.


Myanmar Font Configuration Dialog in IE10 under Windows 7

"Noto Sans Tagalog" is listed although it does not cover Myanmar; and Martin Hosken's Padauk fonts for Myanmar are listed only because they deliberately includes a dotted circle glyph mapped to U+1700.


When Internet Explorer 11 installed itself on my laptop recently, the first thing I did was check the Font configuration setting, as I did with IE7 and IE8 and IE9 and IE10 when they first appeared, but as no changes had been made for IE7 through IE10 I was not expecting anything new from IE11. Imagine my suprise then, when I opened the font configuration dialog and discovered that the list of languages has been expanded from 37 to 55. That seems like one big step forward!


Configurable Languages in IE11 under Windows 7
Language/Script Unicode
Version
Fonts listed
Arabic1.0(various)
Armenian1.0Arial Unicode MS
Sylfaen
Tahoma
Bengali1.0Arial Unicode MS
Shona Bangla
Vrinda
Bopomofo1.0Microsoft JhengHei
Braille3.0Segoe UI Symbol
Buginese4.1Leelawadee UI
Canadian Syllabic3.0Euphemia
Cherokee3.0Plantagenet Cherokee
Chinese Simplified1.0(various)
Chinese Traditional1.0(various)
Coptic4.1Segoe UI Symbol
Cyrillic1.0(various)
Deseret3.1Segoe UI Symbol
Devanagari1.0Aparajita
Arial Unicode MS
Kokila
Mangal
Utsaah
Ethiopic3.0Nyala
Georgian1.0Arial Unicode MS
Sylfaen
Glagolitic4.1Segoe UI Symbol
Gothic3.1Segoe UI Symbol
Greek1.0(various)
Gujarati1.0Arial Unicode MS
Shruti
Gurmukhi1.0Arial Unicode MS
Raavi
Hebrew1.0(various)
Japanese1.0(various)
Javanese5.2Javanese Text*
Kannada1.0Arial Unicode MS
Tunga
Khmer3.0DaunPenh
Khmer UI
MoolBoran
Korean1.0(various)
Lao1.0Arial Unicode MS
DokChampa
Lao UI
Latin based1.0(various)
Malayalam1.0Arial Unicode MS
Kartika
Mongolian3.0Mongolian Baiti
Myanmar3.0Myanmar Text*
New Tai Lue4.1Microsoft New Tai Lue
N'Ko5.0Ebrima
Ogham3.0Segoe UI Symbol
Ol Chiki5.1Nirmala UI
Old Italic3.1Segoe UI Symbol
Old Turkic5.2Segoe UI Symbol
Oriya1.0Arial Unicode MS
Kalinga
Osmanya4.0Ebrima
Phags-pa5.0Microsoft PhagsPa
Runic3.0Segoe UI Symbol
Sinhala3.0Iskoola Pota
Sora Sompeng6.1Nirmala UI*
Syriac3.0Estrangelo Edessa
Tai Le4.0Microsoft Tai Le
Tamil1.0Arial Unicode MS
Latha
Vijaya
Telugu1.0Arial Unicode MS
Gautami
Vani
Thaana3.0MV Boli
Thai1.0(various)
Tibetan2.0Arial Unicode MS
Microsoft Himalaya
Tifinagh4.1Ebrima
User Defined(various)
Vai5.1Ebrima
Yi3.0Microsoft Yi Baiti

* Listed in IE11 under Windows 7 although not actually installed on Windows 7.


This is an impressive list, but a little odd. The list does not include all scripts added since Unicode 3.0, but only a selection of scripts added in Unicode versions 3.1 (March 2001), 4.0 (April 2003), 4.1 (March 2005), 5.0 (July 2006), 5.1 (April 2008), 5.2 (October 2009), and 6.1 (January 2012). In fact the list excludes some 47 scripts added to Unicode between 4.0 and 6.1:


  • Avestan (Unicode 5.2)
  • Balinese (Unicode 5.0)
  • Bamum (Unicode 5.2)
  • Batak (Unicode 6.0)
  • Brahmi (Unicode 6.0)
  • Buhid (Unicode 3.2)
  • Carian (Unicode 5.1)
  • Chakma (Unicode 6.1)
  • Cham (Unicode 5.1)
  • Cuneiform (Unicode 5.0)
  • Cypriot (Unicode 4.0)
  • Egyptian Hieroglyphs (Unicode 5.2)
  • Hanunoo (Unicode 3.2)
  • Imperial Aramaic (Unicode 5.2)
  • Inscriptional Pahlavi (Unicode 5.2)
  • Inscriptional Parthian (Unicode 5.2)
  • Kaithi (Unicode 5.2)
  • Kayah Li (Unicode 5.1)
  • Kharoshthi (Unicode 4.1)
  • Lepcha (Unicode 5.1)
  • Limbu (Unicode 4.0)
  • Linear B (Unicode 4.0)
  • Lisu (Unicode 5.2)
  • Lycian (Unicode 5.1)
  • Lydian (Unicode 5.1)
  • Mandaic (Unicode 6.0)
  • Meetei Mayek (Unicode 5.2)
  • Meroitic Cursive (Unicode 6.1)
  • Meroitic Hieroglyphs (Unicode 6.1)
  • Miao (Unicode 6.1)
  • Old Persian (Unicode 4.1)
  • Old South Arabian (Unicode 5.2)
  • Old Turkic (Unicode 5.2)
  • Phoenician (Unicode 5.0)
  • Rejang (Unicode 5.1)
  • Samaritan (Unicode 5.2)
  • Saurashtra (Unicode 5.1)
  • Sharada (Unicode 6.1)
  • Shavian (Unicode 4.0)
  • Sundanese (Unicode 5.1)
  • Syloti Nagri (Unicode 4.1)
  • Tagalog (Unicode 3.2)
  • Tagbanwa (Unicode 3.2)
  • Tai Tham (Unicode 5.2)
  • Tai Viet (Unicode 5.2)
  • Takri (Unicode 6.1)
  • Ugaritic (Unicode 4.0)

Why exclude these 47 scripts? Well, the answer is that they are all scripts for which Microsoft does not currently support at the font level. So it seems that the Microsoft thinking is that users should only be allowed to configure what font to use for what script if Microsoft provides a font for that script. If Microsoft does not currently provide a font for a particular script, but you have third party fonts installed that cover that script, then hard luck. I have to say that this is a very disappointing attitude, and makes it very frustrating for users like myself who are immensely grateful to Microsoft for supporting minor scripts such as Mongolian, Phags-pa, Tibetan, Yi, etc. but who also wish to use scripts for which Microsoft does not yet provide support.

What about the Myanmar and Mongolian bugs? Finally fixed (or at least, so it seems) – another step forward!


Mongolian Font Configuration Dialog in IE11 under Windows 7

"Mongolian Baiti" font is finally listed.


Myanmar Font Configuration Dialog in IE11 under Windows 7

Microsoft's "Myanmar Text" font is listed, but so is "Noto Sans Tagalog"!


Hmm, something's not right here.

Firstly, the Myanmar configuration lists the "Myanmar Text" font, but the sample just shows boxes. Wait a minute, I don't have the "Myanmar Text" font installed on my Windows 7 laptop, because that font only ships with Windows 8 and later. And for that matter, I don't have the "Nirmala UI" font listed for Sora Sompeng or the "Javanese Text" font listed for Javanese either.

Secondly, the Myanmar configuration still lists the "Noto Sans Tagalog" font even that font has not a single Myanmar character in it. A little experiment shows that when U+1700 is removed from the Padauk font it is no longer listed under Myanmar in IE11. So it seems like the Myanmar bug has not been fixed at all, but the dialog has simply been hard-coded to statically include the "Myanmar Text" font in addition to fonts that are dynamically (but still incorrectly) enumerated.

Thirdly, although the Mongolian dialog now lists Microsoft's "Mongolian Baiti" font, it does not list any of the several other third-party Unicode Mongolian fonts installed on my system. I suspect that the Mongolian bug has not been fixed at all, but the dialog has simply been hard-coded to show the "Mongolian Baiti" font. I have a sinking feeling about this. Let's take a look at Phags-pa, as I recently and belatedly updated my Phags-pa fonts to work under Windows 7+. Will they be listed?


Phags-pa Font Configuration Dialog in IE11 under Windows 7

Microsoft's "Microsoft PhagsPa" font is listed, but not my "BabelStone Phags-pa Book" or "BabelStone Phags-pa Tibetan" fonts.


As I thought, only Microsoft's Phags-pa font is listed. My Phags-pa fonts are not listed even though they set the appropriate Unicode Subset Bitfield bit and cover all Phags-pa characters. However, my "BabelStone Phags-pa Book" font is listed under Latin based and User Defined, so it is not getting entirely ignored by IE11, only ignored for the specific script that it is designed for use with.

After a little investigation, it becomes clear that none of the eighteen new IE11 font configuration dialogs (for Bopomofo, Buginese, Coptic, Deseret, Glagolitic, Gothic, Javanese, New Tai Lue, N'Ko, Ol Chiki, Old Italic, Old Turkic, Osmanya, Phags-pa, Sora Sompeng, Tai Le, Tifinagh, Vai) list any installed third-party fonts that cover the particular script. Furthermore, all eighteen dialogs only list a single font, even in the case of Bopomofo which is covered by more than ten Microsoft fonts in Windows 7, so no choice of font is possible. The inescapable conclusion is that the eighteen new font configuration dialogs in IE11 (and also the dialog for Mongolian) simply list a single hard-coded Microsoft font for each script (even if the listed font is not installed on the system), giving the user absolutely no choice whatsoever over font configuration for these scripts. In other words, the IE11 changes to font configuration are a facade thinly disguising a fake implementation. Who in Microsoft, I wonder, decided that a fake implementation that gives the user no choice (not even Hobson's choice as you cannot not select the proffered Microsoft font) was in any way better than not having the font configuration dialogs for these scripts?

So what initially looked like two steps forward turns out to have been an illusion, a cheap conjurer's trick, and in fact IE11 is not one iota better than IE6 was at allowing the user to configure what fonts to use for what scripts. Twelve years on and zero progress.



Postscript A

Does font configuration even work for scripts that have more than one font listed? Not always, at least not for Tibetan. The Tibetan configuration dialog allows you to choose between the "Arial Unicode MS" font (which has glyphs for Tibetan characters but has no shaping behaviour so combining vowels signs are rendered as spacing marks) and the "Microsoft Himalaya" font (which fully supports Tibetan shaping behaviour), but if you choose "Arial Unicode MS" (not a good choice, but if you offer the user a choice they should be free to make a bad choice) then a web page with unstyled Tibetan text will be rendered with "Microsoft Himalaya".


Tibetan Font Configuration Dialog in IE11 under Windows 7

"Arial Unicode MS" font is listed, but the font in the preview can't be "Arial Unicode MS" as it does not do joined-up Tibetan.


In fact, if you install a good third-party Unicode Tibetan font such as Chris Fynn's Jomolhari, it will be listed in the Tibetan font configuration dialog, but if you select it you will still only ever see "Microsoft Himalaya" used to render unstyled Tibetan text on web pages. So what's the point?



Postscript B

In the Phags-pa font configuration dialog shown above the sample Phags-pa text is ꡏꡡꡋꡂꡡꡙ ꡢꡠꡙꡠ mongol qele, which is a brave but flawed attempt to render Mongolian ᠮᠣᠨᠭᠭᠣᠯ ᠬᠡᠯᠡ mongɣol kele "Mongolian language" in the Phags-pa script. It is wrong on several counts:

  1. In the Phags-pa script a space is always used to separate syllables not words, so there should be three spaces not one;
  2. Mongolian ng should be represented by the single Phags-pa letter nga;
  3. Mongolian ɣ is normally represented using the Phags-pa letter qa;
  4. Mongolian k is normally represented using the Phags-pa letter kha;
  5. Mongolian e would probably be represented using the Phags-pa letter ee here (Phags-pa script has two flavours of e, and although the Phags-pa spelling of Mongolian kele is not attested, by analogy with other Phags-pa Mongolian words ee would be expected).

It is unfortunate that this flawed spelling was chosen as someone at Microsoft asked me for the autonym for Mongolian in Phags-pa script in 2011, and I suggested ꡏꡡꡃ ꡢꡡꡙ ꡁꡦ ꡙꡦ mong qol khė lė which I believe to be much more authentic. Mind you, as the Phags-pa script was specifically devised to be used for writing multiple languages, and during the Yuan dynasty was used for writing Chinese at least as much as for writing Mongolian, as well as for Sanskrit, Tibetan and Uyghur, in my opinion choosing "Mongolian language" as the sample text for Phags-pa is not quite right anyway.


1 comment:

Mike Menzel said...

If I recall, if you hit 'advanced settings' in the font settings, you are prompted to download a plugin. This plugin allows you to specify per-script fonts.