Punycode. Intro and risks.
RFC 3490
tl;dr domain names can now include chars beyond ASCII. Legit examples:
mason@日本.com
wildwezyr@fahrvergnügen.net
punycode
tl;dr things are not what they appear
On the lower layers of communication there are no hànzì chars or other chars beyond ASCII. PUNYCODEing happens.
e.g 日本.com
is converted to xn--wgv71a.com
more on punycode: https://en.wikipedia.org/wiki/Punycode https://tools.ietf.org/html/rfc3492
web converter can be found here: https://www.punycoder.com/
risks:
demo: https://www.xn–80ak6aa92e.com/ will be converted to for user in web browser (till security update) as https://www.аррӏе.com
[there is “fake” char in place of l
] what more with green padlock standing for valid certificate!!! STRANGE
webdeveloper issues:
Be aware that html5
form field email
will pass all chars submited to your backend being already punycoded!! So be aware when implementing custom filtering mechanism to not search for raw stuff like mason@日本.com
but rather mason@xn--wgv71a.com
python libs for conversion https://pypi.python.org/pypi/idna