Initial Beta versions of Internet Explorer 7 did not allow mixing of labels of ASCII and non-ASCII characters in a single Internationalized Domain Name URL. There is good news, however, that Microsoft is making this support available in IE7 official release. Program manager Tariq Sharif says, “Microsoft has listened to developers feedback during Beta 2 and we are changing the principles of IDN to accommodate the way customers want to use international characters on the web.”This means users will no longer be barred from using mixed script URLs as was the case till IE Beta2. Don’t blame Microsoft for this, as the restrictions were designed due to security considerations.
IE7 Beta 2 did not allow intermixing of scripts for a given label (a label is a segment of a domain name, delimited by dots; www.microsoft.com contains three labels “www”, “microsoft” and “com”) in a URL. Also, for a given label IE did not allow mixing of non-ASCII scripts with ASCII. People at Microsoft justify this as, “this step was mainly taken to protect users against homograph-spoofing attacks. Consider the scenario where a user commonly browses sites with Cyrillic URLs. If the user gets a phishing email to visit www.paypal.com where one of the ‘a’s is in ASCII and the other is in Cyrillic, the user might believe they are visiting the real paypal which uses all ASCII characters in their domain name.”
Does this mean the IE7 official release will expose us to the threat it was earlier trying to shield us from? No, they have come up with a strategy to avoid such infiltrations. To protect against such spoofs, IE7 will detect the mixed characters and show the URL in Punycode rather than misleading the user.
According to Tariq Sharif, the IE team worked with experts from the Windows Globalization team to investigate which scripts can be mixed safely with ASCII characters. In the Release Candidate build (post-Beta 3), IE will permit mixing of ASCII with certain scripts and will display the URL in Unicode. However, IE still will not allow intermixing of allowed scripts (list given below) within a label, if they belong to different languages, even though the user has added the language containing the scripts to their Accept Languages.
IE will now display this URL in Unicode for a user who has added Korean language support, since the non-ASCII script belongs to the Korean language set and is now on the allowed list of scripts. However, IE will show the raw Punycode encoding for a user who has not added Korean language support.
Here is a list of scripts that IE will permit to mix with ASCII:
• Beng (Bengali),
• Bugi (Buginese),
• Deva (Devanagari),
• Ethi (Ethiopic),
• Gujr (Gujarati),
• Guru (Gurmukhi),
• Hang (Hangul),
• Hani (Han),
• Hebr (Hebrew),
• Hira (Hiragana),
• Kana (Katakana),
• Mlym (Malayalam),
• Orya (Oriya),
• Sinh (Sinhala),
• Syrc (Syriac),
• Taml (Tamil),
• Telu (Telugu),
• Thaa (Thaana),
• Thai (Thai),
• Tibt (Tibetan)