A domain name homograph spoof involves the use of a domain name that visually resembles another, presumably well-trafficked, domain name, such as paypa1.com for paypal.com.

The vulnerability exists with the use of only one alphabet but the potential for misuse multiplies with the use of Internationalzied Domain Names (IDNs).

As this VNUNet article reports:

Steve Dyer, director of UKIF told Compueractive there were real concerns about misuse of this by criminals. “The Russian ‘A’ looks just the same as the English ‘A’ although it means something different. A criminal could register a domain name using a mixture of ASCII and Unicode that is indistinguishable to the ordinary surfer from the genuine site.

“To prove a point, the website PayPal was created using a mixture of the European and Russian alphabet. People were directed to a fake site and phishers can steal personal details. This site was handed over to PayPal but shows how dangerous this could become”, he said.

The proof of concept attack was well-publicized, leading some to suggest that software such as browsers should reduce their support for IDNs until these issues are resolved.

CENTR, the Council for European TLDs, has characterized such calls as over-reactions.

Additionally, ICANN has released a “Statement on IDN Homograph Attacks.”  It points out that (1) the problem existed before ICANN existed;, (2) that the ‘global Internet community’ is working on the problem, and (3) that ICANN was opening a ‘global comment forum’ on the issue.

Both CENTR and ICANN emphasize the point that homographing spoofing existed before IDNs.  However there is a new facet with cross-character set IDNs of the type discussed in the VNU article that didn’t exist prior to IDNs. 

I will phrase this as a question because I don’t have the tech background to make this an assertion:

If I’m dealing with one character set, there is a finite number of domain names such as paypa1 that look like paypal, which means I can probably track down the whois info.

If there are N character sets that contains a character that looks like a ‘p’ and N2 character sets that contain a character that looks like an ‘a’, and so on, then wouldn’t there be N x N2 . . . . . x N6 permutations that create a name that look like paypal?

Doesn’t that make the name virtually untraceable through whois?

Is that how these IDNs work?

A list of links with plenty of background and discussion via LexText here.