2) United States Patent (10) Patent No: US 9,218,334 B2
Mugali, Jr. et al. (45) Date of Patent: Dec. 22, 2015,
(54) PRONOUNCEABLE DOMAIN NAMES osmoss1s AL, 12008 mand
(71) Applicant: VeriSign Inc, Reson, VA (US) 2unsionassi2 AL? 22005 Tpak eta Tarot
ongoing AI” 00° cal
(72) Inventors: Aditya Anand Mugall, Je, Reston, VA.
(US), Andrew W, Simpson, Sterling, VA
(US); Seott King Walker, Purcellville,
VAs)
(73) Assignee: VERISIGN, INC., Reston, VA (US)
(4) Notice: Subject to any diselaimer, the tem of this
patent is extended or adjusted under 35
USC. 154(b) by S18 days.
(21) Appl. Nos 131672,226
(22) Filed: Nov.8, 2012
65) Prior Publication Data
US 2013011701341 May 9,2013
Related US. Application Data
(G0) Provisional application No. 61/5S7:248, filed on Nov.
8.2011
(1) Ince
Goo 1727 (2006.01),
GOOF 1728 (2006.01)
Host, 29712 (2005.01)
(52) US.Cl
eee GOOF 172274 2013.01): GOOF 1727
(2013.01, GU6E 17/28 2013.01); GOOF
172881 (2013.01); HOsL 613025 (2013.01)
(58) Fleld of Classification Search
cee Gose 17/274
usp Fou; 713/188
‘See pplication ile for complete sear history.
66) References Cited
US. PATENT DOCUMENTS
S890448 A 12/1908 Ganesan
‘Tends? Bi 62006 Mada el 07788
Tpi27i6 B2_ S201 Meculler
8200914 BIt 92012 Ranjan 709204
2oov182110 AL* $2003 Deng 700231
100
110
wot
|_ Serta _|
Al* 92008 Owen
Als 22009 Phi
‘2000125308 AL* $008 ml
anno02soa71 AL* 102009 Kanevsiy ea
ggonnostI9 AL* 122000 Wuetal
0220503
Ea
42002
(OTHER PUBLICATIONS
‘eater Crawford tal, "Kwyibo: automatic domain name gener
tion, Software—Practice ad Fxperence vol 36,2008, pp. 1861
587
Mark Goadhich “Generating Pronounceable Nonsense Wor: Nily
“Assignment”, Jouaal of Computing Sciences in Colleges, vl. 25,
Issue 5. May 2011, pp 36-37
than, "Namco Introduces Domain Naber NXD Scores for Net,
{CC aid “TV, Namecom Blog, Aug. 2011, Retrieved fiom the
Ine: pram som Hog general evelopment 201108
‘ae con-eraducerdonalvaabba-axd-sonce rhe cca
oir
siendet European Search Report dated Ma. 11 201
‘Aprieation Nor 121918726 filed Now. 8 2013, pp I-10
European
* cited by examiner
Primary Examiner — Shaun Roberts
(74) Attornes, Agent, or Firm —MH2 Technology Law
Group, LLP
on ABSTRACT
Embodiments of the present teachings relate to systems and
thods for generating proaenceahle damain names. The
‘method includes proving a list of charocter strings; filtering
the list of character strings through a ist filter based on a
Phonetic model to produce a first filtered list of character
‘rings filtering the ist of character sings through a second
ier based on a character order mode to produce a second
Iilered list of character strings: and generating, by a proces-
0g a Tistof pronounceable domain names based on the frst
filteod ist of character strings and the second filtered list oF
charter strings.
27 Claims, & Drawing Sheets
Prepare Saas Bais]U.S. Patent Dee. 22, 2015 Sheet 1 of 4 US 9,218,334 B2
FIG. 1
J
= = F ni chesne one
120~ | aia Phono oa | sa0-~ [sa rarer Orca |
Pronouncaable
Doman Names
FIG.3
310-~ Build ARFF File
———
Populate Attributes
for Pronounceable
‘Source Words
320.
Populate Attributes for
noa-Pronounceable
‘Source Words
330—
340~J
Generate Character
Order Mode!
aU.S. Patent Dee. 22, 2015 Sheet 2 of 4 US 9,218,334 B2
FIG. 2
220 ~_{ Greate Data Entios 230-—~_[ Greate Data Entries for
‘or Pronounceabie | “non-Pronaunceable
Source Words |___Souree Words
ae fo
222~_{ Calculate n-gram 232~_ Calculate n-gram
Frequencies Frequencies
224-~[” Generate Doubie | 234~[ Generate Double
Metephone | Metapho:
—_
226~[ Attibute Class = ¥ 236~
250~| Generate Phonetic
Model |US 9,218,334 B2
vol
=
ar anejanu
pieoghay i [ Homan
| 8 =
we
Sheet 3 of 4
Dee. 22, 2015,
US. Patent
sadepy a | asnow
Aeidsig i A
- |
J
| 207 we |
Koway) ue soss0%
ene | nue #80204
BOR
we
un adesois
ajgenowoy
oorUS 9,218,334 B2
Sheet 4 of 4
Dee. 22, 2015,
US. Patent
sou
seuuks seyons oes |
SejuNs Jeajuns reKNs Jejens JeMOS AROS e_UOS
Ses JeoMOS seYBOS TEPUIS JeZNIS JeOMS Asus
JIMS syRUS fous YOHLUS CoS Jems oORUS PERS |
JeplUs JecRUS JejAUS JejonIs JeMIS Je;ENIS JENS seIdIs
OBIS SePUUIS JENS Jens eOWIS eBoUIS
J920uiIS JOS JenowIS JeoawIS nes oBeUIS
Jlewis JeUIS JEYIs JepeIS Je}09!S IWEIS JeIUS Jenas
om
StU |US 9,218,334 B2
1
PRONOUNCEABLE DOMAIN NAMES.
(CROSS REFERENCE TO RELATED
"APPLICATIONS,
‘This application claims priority from U.S. Provisional
Pateat Application Ser. No. 61/$57,248, filed Nov. 8, 2011,
which is hereby incorporated by reference in their entirety:
FIELD
This disclosure relates generally to systems and methods
Jor building phonetic and character onder models used to
identify pronounceable domain names, and generating pro-
nounceable domain names using the phonetic and character
‘order modes.
DESCRIPTION OF THE RELATED ART
‘Short domsin names are popular because they ean be
‘catchy and easier (o remember than longer domain names.
However, when seeking to register a new domain name, many
people argue that the COM zone fl, nd that there are no
ood domain names left. This is fir from the truth. Although
‘many domain names that contain ditionary words may be
registered, statistically there remain large numberof unreg-
istered and available domain names of different character
Jengths, many of which may be pronounceable. While some
‘of these unregistered domain names may be nonsensical
strings of characters, many of them may be easy to use and
potentially could be popular and successful domain names i
omeane were to pst marketing and brand name behind them.
‘One example ofthis is Google.com. While the word pooule
was aot previously a word, it has now become a word.
‘One goal of short domain names may be to be memorable
and communicable. Thais, when viewing the domain name,
individuals should be able to pronounce the domain name,
remember the domain name, and, when telling the domain
rameto others the other individuals should be alee remem -
ber and easily find the domain name. For example, if fifteen
people were to look ta domain name (that isnot a dictionary
word) and ten to fillcen of those people were able to pro-
ounce the domain name the same way and tell it to theft
fiends who could then easly find and visit the domain, dhe
‘domain name may bea good choice, despite not being a real
word
‘More specifically, Sand 6 character domain names are very
popular. However a very lage percentage of Sand 6 char
ter domain names in the (COM zone have already been reg-
istered. As discussed above, however, there are many unreg-
istered 5nd 6 character domain names that do not have any
specie meaning in the English Latin scrip, yet may be pro-
nouneenble
Itisaccordingly an object ofthe disclosure to build models
that may be used io dentiy pronounceable domain names. It
is another object ofthe disclosure to generate pronounceable
‘domain names using the pronouncesble domain name mod
‘els, The systems and methods according to embodiments of
the disclosure may be used to generate pronounceable com-
ppany names or websites based on input words relevant to the
Particular business.
“These objects may be achieved by using a dictionary set to
Jeam what words are pronotneeable, A combination of Bays-
san networks to learn the composition of phonetics in pro-
rouncenble words and decision trees t lear the order of
‘characters ina pronounceable word may be used, along with
0
o
2
-gram Scoring heuristics, to build models that can predict if
a given word is pronounceable based on the learned models
SUMMARY,
Inaceordance withthe disclosure, systems and methods for
building phonetic and character order models used to identify
ronounceable domain names, and generating pronounceable
‘domain names using the phonetic and character onder models
are provided.
Tnone embodiment, a method of generating pronounceable
domain names may include: (1) building a phonetic model
represcnting phonetic characteristics of pronounceable
words; (2) building a character order model representing
character order characteristics of pronounceable words; and
generating a list of pronounceable domain names by gener-
Ating a ist of character strings, and passing the list of char-
acter stings through first filler based onthe phonetic model
‘and second filter based on the character order model,
In implementations, a computer-implemented method of
szencmting pronounceable domain names is disclosed. The
‘computer implemented method can comprise providing a ist
‘of character strings; filtering the list of character strings
through a firs filter hased on a phonetic model to produce &
fist filtered fst of character strings; filtering thelist of char-
acter strings througha second fier based ona character order
nodel to produce a second filtered list of character strings;
and generating, by 2 processor, a list of prouounceable
‘domain names based on the fist filtered ist of character
strings andthe second filtered list of charscter strings
In implementations, the computer-implemented method
can farther comprise building the phonetic model represe
ing phonetic charscterstcs of pronounceable words
‘Tn implementations, the computer-implemented method
can further comprise buikling the character order model rep-
resenting character order characteristics of pronounceable
words
Inimplementations, in the building of the phonetic model,
the method can further comprise preparing library of source
‘word comprising proaounceable words nd non-pronounce-
able words; and providing the library of source words to a
Teaming model algorithm to train the learning model algo-
rithm to determine characteristics of pronounceable nd char
acteristies of non-pronounceable wots
In implementations, the learning model algorithm ean
comprise a Bayesian network,
In implementations, in the preparing the library, the
‘method can comprise buikling an attribute relationship file
format (ARFF) based on the library of source words; and
associating one oF more attibutes of pronounceable words
‘and non-pronounceable words with the ARFE.
Tnimplementations, at least oneatributeof the one or more
atibutes of pronouncesble words and aon-pronounceable
‘words can comprise an n-gram score.
In implementations, the computer-implemented| method
an further comprise calculating first n-gram score forthe
pronounceable words in the library; and calculating a second
negra seore forthe non-pronounceable words in the library:
In implementations, an attbute of tbe one or more
tributes of pronounceable wards and non-pronounceable
‘words can comprise a phonetic representation.
‘Tn implementations, the eomptter-implemented method
an further comprise determining the phonetic representation
{or each pronotncesble word and each aon-proscunceable
‘word in the library.US 9,218,334 B2
3
In implementations, the phonetic representation can co
prise a Metaphone representation or a Double Metapbone
representation.
Tinimplementations, the building the charaeteronder model
‘ean further comprise preparing a library of source words
‘comprising pronouncesble and non-pronounceable words
and associating attributes of pronounceable souree words and
ibutes of noa-proncnceale source words with words in
the library of source words
‘In mplementations, the phonetic model can be operable to
determine a peobabilty that an input charter sting ofthe
Tistof character sirings is pronounceable
TInimplementations, the characterorder mexlelcan be oper:
able to determine probability that an input character string
‘of thelist of characte strings is pronounceable.
In implementations, the first filtered list of character strings
‘can be provided to the second filter to produce a second
‘ltred list of character strings.
In implementations, the second filtered lst of character
strings can be provided to the frst filter to praduoe a fst
filtered list of character strings.
In implementations, the computer-implemented| method
‘can further comprise determining s first peobabilty. using the
phonetic model, that an input character string of the list of
‘character stings is proounceable: determining a second
probability, using the character order model, that the input
‘character string is pronounceable; and detemining a come
bined probability based on the first peobabilty and the second
probability thatthe inpatcharectr string is pronounceable:
‘comparing the combined probability with a pronounceabilty
threshold to determine whether the input character string is
Tikely tobe pronounceable; and providing the input character
string as thelist of prnounceable domain names.
In implementations, the computer-impleniented method
‘can furher comprise determining if pronounceable domain
ramen te listofpronounceable domain names is registered:
and providing alist of altemative suggestions of unregistered
pronounceable domain names from thelist of pronounceable
‘domain names ifthe pronounceable domain name is deter-
mined to be registered or is unregistered
In implementations, the compiter-implemented method
‘can further comprise ordering the ist of alternative sugges
tions of unregistered pronounceable domain names based on
‘oneormon of the following; primitive distance between the
pronounceable domain name and the alternative suggestions
‘of unregistered propounceable domain names, a depree of
similanty between the pronounceable domain name and the
‘altemative suggestions of unregistered -pronounceable
‘domisia names, an amount of traffic the pronounceable
‘domain name is receiving, or combinations thereof,
TInimplementations, a device is disclosed that can comprise
at least one processor: and # non-transitory computer read-
‘ble mediums comprising instrctions that cause the atleast
‘one processor to perform a method eommprising: providing 3
Tist of character strings; filtering the list of character strings
through a first filter Based on a phonetic modal to procace @
fst filtered list of character strings: tering the Hist of char-
acter stings throngha second filter based on acharacterordet
‘model to produce a sevond filtered list of character stings:
and generating, by a processor, a ist of pronounceable
‘donstia names based on the fist filtered list of character
strings andthe second filtered ist of charter strings.
In implementations, a computer readable storage mest
jis disclosed that can comprise instraction that ease one oF
‘more processors to perform a method comprising: providing
a listof character strings; filtering the ls of character strings
through a first filter based on a phonetic model to produce &
0
o
4
first fitered Hist of character strings: filtering thelist of char
acter stings through second fiter based ona character order
model to produce a second fered list of character stings
fand generating, by a processor, a list of pronounceable
‘domain names based on the fist filtered ist of character
strings andthe second filtered list of chamscter strings
Additional objects and advantages ofthe embodiments of
the disclosure will beset forth in part inthe description which
follows, and in pant will be obvious from the description, oF
nay be learned by practice of the embodiments, The objects
and advantages of the embodiments will be realized and
‘attained by means ofthe elements and combinations partien-
larly pointed out inthe appended claims.
cis to be understood that both the foregoing general
description and the following detailed description are exem-
plary and explanatory only and are not restrictive of the
embodiments, as elaimed.
BRIEF DESCRIPTION OF THE DRAWINGS.
‘Various features of the embodiments can be more fly
‘appreciated, a the same become better understood with re
ference tothe following detailed description ofthe embodi-
‘meats when considered in connection with the accompanying
figures, in which:
FIG. 1 illustrates an exemplary flowehart for @ method
secording to embostiments forthe disclosure
FIG. 2 lustrates an exemplary flowehart for method of
cresting a phonetic model according to embodiments of the
disclosure.
FIG. 3 illustrates an exemplary flowchart for method of
character order model according to embodiments of the di
closure.
FIG. 4 is a block diagram of an exemplary computing
system, according fo various embodiments
FIG, 5 illustrates an exemplary user interface according 0
embodiments forthe disclosure,
DETAILED DE
CRIPTION
Foe simplicity and illustrative purposes, the principles of
the present teachings are dereibed by referring mainly to
exemplary embodiments thereof, However, one of ordinary
sill in the art would readily recognize thatthe sume prin-
ciples are equally applicable ro, and can be implemented in,
alltypes of information and systems, and that any such vara
‘ions donot depar from the trie spirit and seope of the present
teachings. Moreover, in the following detailed description,
references are made othe accompanying figures, which lly
‘wate specific exemplary embodiments. Flectrical, mechani-
cal, logial and structural changes may be made to the exem-
plary embodiments without departing from the spirit and
Scope of the present teachings. The following’ detailed
description is, therefore, not to be taken in limiting sense
fand the scope of the present teachings is defined by the
‘appended claims and their equivalents
FIG. 1 llusrates an exemplary flowchart fora method 100,
for building phonetic and character onler models wsed 10
identify pronounceable domain names, and generating pro-
‘nounceable domain names using the phonede and character
order models. While FIG. illsiates various processes that
‘ean be performed, oneskilled in hear will walizethat any of
the processes and stages ofthe processes can beperformed by
‘ther components of a computing device. Likewise, one
lille inthe art will realize thatthe illustrated stages of the
processes are exemplary and that any oF te illustrated stapesUS 9,218,334 B2
5
‘can be removed, addtional stages cam be added, and the order
‘of te illustrated stages can be changed
The method may be cured out for domain names of a
specific length (eg. 5 characters), or for domain names of
rnutipteTengths (eal domain sames having 310 charae-
ters). Throughout this disclosure, embodiments will be
‘deseribed for generating pronouncesble domain names bav~
Jing length ofS characters However, the exemplary emboli
ments having 5 character domain names ate not intended 10
limit the scope of this disclosure, but are provided as an
ilustaon of only a subset of possible embodiments of the
disclosure,
Asshown inFIG. 1, source data is gathered and prepare in
110. The source data may include both pronounceable and
non:peonounceable words. Throughout this disclosure, the
term “Words” is used in aocondance with ils ordinary usage,
and also to generally denote charactor strings, whether or not
the character strings form “words” in the ordinary sens.
‘The pronounceable words in the source data may be taken
Jom an English language dictionary. For purposes of build-
ing the modbls, i s assumed that ail dictionary words are
pronounceable. Throughout this disclosure, the domain
ames are diseussed with respect to English words and pro-
rnounceability. However, the systems and methods disclosed
‘could also be used in connection with demain names and >
pronounceability of other languages. For example, to gener
te pronounceable domain names in Spanish, a Spanish lan-
tage dictionary could be used in pace of an English Jan-
‘zuage dictionary. Other sourees of pronounceable words
‘couldalso hese forthe source data, orasubset of dictionary
‘words could be used,
The non-pronounceable words in the source data may be
generated wring a random string generator For example, if 5
‘character domain names are being generated, random string
enerator may generate random § character strings as non-
pronounceable words for the source data. Alternatively, ran-
‘dom character strings of varying lengths may be generated,
For purposes of building the models, tis assumed that the
randomly generated strings are non-pronounceabl. In other
‘embodiments, the randomly generated stings may be eros
checked against a list of dictionary words to remove any
known pronounceable words from the randomly generated
strings
‘The souree data includes both pronouncesble and non-
proaounceable words so that te leaming models have leara-
{ng information about characteristics of both ronounceable
‘and non-pronounceable words. Inone embodiment, the mum-
berof pronounceable words in the source data is substantially
‘equal to the numberof non-pronounceable words in order 0
‘equally weight the source data. Although FIG. 1 illustrates,
‘oly one set of source dats being use for both the phonetic
model andthe character order model, in other embodiments,
separate source dala sels may be provided forthe separate
models.
Tn 120, 2 phonetic model is built using the source data
prepared and obtained in 110. Morespecife details regarding
building the phonetic mode accor to an embodiment are
described below with reference 10 FIG. 2
In 130, a character onder model is built using the source
data prepared and obtained in 110. More specific details
regarding building the character order model aveoring 10a
‘embodiment are described below with reference to FIG. 3.
In 140, pronounceablo domain names are generated vsing
the phonetic model and thecharaeter onder model, Generating
the pronounceable domain names may include creating a ist
‘of character input words (where ais the desired domain
name length), and filtering the list input words through the
o
6
phonetic and character order models, The list of n-character
put words may be limited toa single n, for example, only $
character input words, or my include several different
Jengths. In some embodiments, the input words are intended
tobe used as domain names, 0 only the characters 0.9, A,
and hyphen ae includ as possible characters. However, in
‘ther embodiments where the inpot words may be intended
for use in other applications other restritions (oF no restic-
sions) may be placed on the character set from which the
secharacter inp words are formed,
Additionally, in one embodiment, the list of n-character
input words may include all possible ncharacter words. In
‘anather embodiment, the lis of n-charaete input words
fnclude an appropriate subset of all possible n-character
words
For example, each ofthe models may retumn a probability
‘hat the input word is pronounceable, represented by an out-
put from 0.00 1.0, Thelistof input word may be is filtered
by the phonetic model. Theo, the input w the character order
model may be limited t only those input words witha prob-
ability valne from the phonetic model that exceeds a prede-
termined threshold, for example, 90% or 0.9. The fltered
‘input words may then e filtered by the character order model.
After passing the filtered input words through the character
‘order model, the final lst of pronounceable domi. names
‘may be limited t0 only the input words with a probability
value from the character order model that exceeds a predoter-
‘mined threshold, for example, 95% or 0.95. The thresholds
may be set at other probability values as desired or deter
mined to return optimum results,
Tn other embodiments, the order of the filering may be
reversed, with the input words first being filtered by the char-
acter orser model, then filtered ty the phonetie model. In
‘ther embodiments, al of the input words maybe itered by
both models, and the final lst of pronounceable domain
‘names may be determined based on a combination of the two
probability values retamed by the models. For example, in
‘one embodiment, only input words that ceived a prodeter-
‘ined probability, (eg. a least 90% oF 0.9) in both models
‘may be ineluded in te fina ist of pronounceable words or
‘domain names. In another embodiment, the ealulated proh-
abilities may be combined or nmatiplied, then compared to @
threshold to determine whether each ofthe input words is
sulicientlyTikely tobe pronounceable aso be included inthe
Tina ist of pronounceable domain names,
Tshowd benoted that, while FIG. 1 instrates the phonetic
model and character onder models being builtin parallel, ia
other embodiments, the models may be built consecutively ar
inany other order.
FIG. 2ilustrates an exemplary Nowehart fora method 200,
ofereatinga phonetic model according to embodiments ofthe
disclosure. Although the embodiment illustrated in FIG. 2
‘and diseussedelow specifically ses theopen source WEKA
program and a Bayesian network to leam, develop and gen-
erate the phonetic model, other programs and earning models
‘may be wsed in other embodiments. While FIG. 2 illustrates
various processes that canbe performed, one killed in the at
will realize that any of the processes and stages ofthe pro»
cesses can be peeformed by other components of computing
device. Likewise, one skilled in the art will realize that the
illustrated stapes ofthe processes are exemplary and that any
ofthe illustrated stages can be romoved, additional stages can
be added, and the order of the ilusrated stages ean be
changed.
‘As showin in FIG. 2, an attribute relationship fle format
(ARE) thes built using source data in 210. The source data
may be, for example, source data such as is prepare in 110US 9,218,334 B2
1
‘above, In one embodiment, the number of pronounceable
‘words in the source data is substantially equal to the number
‘of non-pronounceable words, in order to equally weight the
source data. The ARF isa text file format used by WEKA to
sore data in a database, The AREF file contains ateibutes
bout the source data, The atributes forthe source data that
‘contains pronounceable words may be generated separate
fom the attbutes for the source data that contains none
pronounceable words
In 220, data entries for pronounceable source words may
be created. Similarly, n 230, data enees for uon-pronounce-
able Source words may be created
In 222, n-grams over the set of pronounceable source
words maybe calculated. For example ithe domain names
that are ultimately to be generated are 5 character strings,
2-grams, 3-grams, and 4gramsmay becalculted over the set
‘ofpronouneeable source words. An -gram's valueisthe otal
number of occurences of the n-gram in the set of source
words. As a brief example if the souree includes only the
words “attack, daily dat, fi, "then the n-gram score for
is 2, for“al"is2, for “i” is 3, for “at is 2, and for “da
js 2. Other n-grams in tis dataset, such as “ata” occur only
‘once, fora score of 1, Then-grain scores form attributes ofthe
AREF ie for each pronounceable source word.
Inthe above brief example, the 2-gram atibute of “daily”
would be the sum ofthe 2-gram scores forthe 2-grams wi
the word daily: da aii snd ly, which ae 2,2, 3,1, fora
total 2-gram atteibute seore of 8. The 3-gram attribute of
“would bethe sumof the 3-gram scores forthe grams
‘within the word daily: dai, si, and iy, which ae 1,2, nd 1,
{ora total 3-grain atribute score of 4. The process for ealew=
lating the n-gram attribute for each value on continues this
‘In 232, n-prams over the set of non-pronounceable source
words may be calculated, and n-gram attributes generated, in
‘a manner similar to that described above with reference 10
222 In other embodiments, the calculated n-gram frequen-
‘ies from the prononnceable source words may be used {0
lgenerate the n-gram attributes for the non-pronounceable
Source words, without ealculating n-gram frequencies over
the non-pronounceable source words
1224, Double Metaphone ateibute is defined foreach of
the pronounceable source words. The Double Mctapbone
attribute is based on the Double Meta phone representation of
the source words. The Double Mela phone is a standard
phonetic representation of words. In other embodiments,
‘ther phonetic representations of the source words may Be
used, such a8 a Metaphone representation or any future ver
son or variant of such, Each charter in the Double Meta-
Phone representation of the source words may form an
tbat
In 244, Double Metaphone attribute is defined foreach of
the non-pronounceable sourve words, similar to the method
described in 224 with respect to the pronounceable source
words,
Th 226, a class attribute is defined for each of the pro=
ounenble source Words, The class atribute identifies
‘whether the source word is pronounceable. Ts, each ofthe
pronounceable source worts receives a class attribute"Y", oF
‘ther alirmative representation. Similarly, in 236, each ofthe
rhon-peonounceable source words receives class atteibute
“N°. or other negative representation,
In 240, the attributes of the pronounceable source words
and the attributes ofthe non-pronounoeeble source words are
‘combines into the ARFF file, In other embodiments, the
Pronounceahle and now-pronounceable source words ae aot
separately processed (as illustrated in FIG. 2), but are instead
0
o
8
processed together, in which case there may be mo need 10
‘combine the atributes into the AREF file, because they wil
already be present in the ARF file
Tn 250, 2 phonetic model is generated based on the
tributes ofthe pronounceable source words and now-pr-
‘ounceable source words inched in the ARFF file In one
embodiment, tis is accomplished using the Bayesian Net-
‘work Algortim in WEK., The resulting Bayesian network
‘model, oF phonetic model, canbe used to predict a probability
thats given inpatstring willbe pronounceable. Whilea Baye-
sian network has been described as used to generate the
phonetic model, ether appropriate machine leaning models
:may be used in other emboxliments
FIG. 3 ilustratesan exemplary owchart fora method 300,
of ereating a character order model according to embod
atsofthe disclosure. Thecharacteanler model may deter
‘mine the probability that an input word is pronounceable
based on information leamed by and stored in the model
relating to the chain of order of leters in pronounceable
‘words, While FIG, 3ilustrates various processes that ean be
performed, one skilled in the art wil realize that any of the
processes and stages of the processes can be performed by
fther components of a computing device. Likewise, one
salle inthe art will realize that the illustrated stages ofthe
processes are exemplary and that any a the illustrated stages
can be removed, alitional stages can be added, and the order
ofthe illustrated stages can be changed.
‘As shown in FIG, 3, ARFF file is built sing source data
{n310, The source data may be, for example, sourcedata such
asi prepared in 110 above, andor source data that was Used
jn the method 200 described above. In one embodiment, the
‘numberof pronounceable wordsin the source datas substan-
tilly equal to the number of nonproncunceable words, in
lrder to equally weight the source data
In 320, the ARFF file is populated with atibutes ofthe
pronounceable source words. The attributes af dhe ARFF fle
{or the character order model inelude the characters of the
source words, and a class attebute that identifies whether the
‘word is pronounceable. This, in 320, the ARF file is popt-
lated with the character atibutes'of the pronounceable
source wards, and a lass atibute"Y™
‘In 330, the ARFF file is populated with attributes ofthe
nonpronounceable source worts. The atributes of the none
pronounceable source words are populated ina similar man-
fer as described above with respect o the pronounceable
source words, except that the class atribute for the non-
‘ronounceable source words is “N”.
In.340, the character onier model is generated based onthe
atibutes of the pronounceable and_non-pronounceable
source words stored in the ARFF file. This may be accom
plished using the J48 decision ree algorithm in WEKA. The
‘sulting character order modo can be used to predicta prob-
ability that a given input sting wll be pronounceable, While
the J48 decision tree algorithm has been described as used t0
genente the character order model, other appropriate
‘machine leaming models may be used in other embodiments
‘After building the phonetic and character onder models
based on source data, new input words may be processed
using the phonetic and character order models to determine
theprobabilty thatthe new input words are pronounceable. If
itis determined with a sufficient degree of eeetainty that the
ew input wordsarepronounceable, the new input words may
be output to a user, oF otherwise stored on a storage devies.
‘This may beusefl,forexample, in suggesting domain names
tousers seeking o register a domain name. Por example, ifa
ter requests a domain name that is already registered, the
systems and methods desribed in the disclosure may be usedUS 9,218,334 B2
9
‘osugyestaltemative, pronounceable domain names based ai
the originally requested domain name. The suggestions may
be bosed on the requested domsin name, ora user may enter
several relevant key words, and the suggestions may be based
uher than a specific requested domi
ile, a primitive distance may be wsed to
‘enlat the sett ems wth th posible suggestions om
the pronounceable domain names system. That is, any tem
that is within a certain distance of the search term being
passed (0 the system may be returned, and the system could
perform further iterations to improve matches between the
search terms and suggestions for even more relevant pro-
ounceable domain name suggestions
‘The output pronounceable words may be prioritized. The
prioritized output words may be stored or displayed in the
Prortized onder. ‘The output words may’ be priontized, for
‘example, based on the primitive distance between the output
‘words and the input request, Inother embodiments, the output
words may be prioritized based on a degree of similarity of
the output word withthe input by the user. Ia another embod
ment, the output words may be prioritized based on the
amount of traffic the word or string is receiving, Ifthe output
word or sting is not a registered domain name, NXD tealic
information may be used to determine traffic volume forthe
non-existing domi,
‘By way ofa non-limiting example, considera teaming task
{or fter 1: create 2-, 3, 4-gram score from dictionary. Asn
‘example three entry dictionary consisting ofthe following
ties: f00, food, and world. The 2-arams are fo, 00, fo, 0,
‘ed, Wo. oF, Id. The 3-grams are foo, foo, 00d, Wor, of ld
“The4-grams food, wor, ord, The scores seross n-gram ae,
fi.2: 00:2; ode]; wo: oF; 11:1; KT; 00:2; ood; wor:
‘ofl; rd: fod; wor orld, Double metaphone repre:
sentations are then created of these words as shown in Table
1
TABLE]
outs Meustons meats
‘The same is done for randomly generated words that aren’
prosounceable and build the “learning” par of the AREF for
Sher 1
TABLE 2
Die TKS
ik =
‘The ARPF can be represented as shown below:
{Gatrtce ml (shestaths) kim ncg.geatasonsz0. NONE}
0
o
10
ae
eae
‘The ARFF shen runthrough a Bayesian network to learn the
sodel for filter 1
"An ARFF is then created for the filter 2. This AREF bis to
bbe ereted specific to a usecase, so consider the filters built
{or letter Words, instead of fo, food and world assume that
we used hello, world and green. The ARFF can be represented
as shown below:
Gunite! fwbedataniiikinacparstasmxy2)
batahue m2 (aheddefanigilacpasstatsinny2)
(Gurshde ms (akedefeniuelmaspazataemey2)
(Gate mt (anednfenistlmaspazstavmes)
Gutrite ns [shedtelanivkimaoparatanmns)
aoc
‘hie AREF is then run through a decision tee 148 algo-
ith and build the model for itr 2.
‘The processing steps can include the following: (1) erate
set of n-character names tobe filtered (2) pass the names
through filter I which generates a probability fora given word
to be pronounceable; (3) iter names that are below a given
‘threshold; and (4 follow the same steps for filter 2.
‘The methods described throughout the description of
‘embodiments ofthe disclosure may be implemented or per
Tormed by a system tat includes a processor and a memory
An exemplary system for generating pronounceable domain
‘names may include a processor, storage, amemory, and int
output (VO) devices. The system may be implemented in
various ways. For example, the system may be embodied in 2
general purpose computer a server, mainframe computer, oF
‘any combination of these components. The system may be
standalone, or itmay be part ofa subsystem, which may in
‘tur, be part ofa largersystem. Further, thecomponentsof the
system may be separated, or inteprated into singe system.
‘The processor may include one or more knowa processing
vices, such as # microprocessor from the Pentium™ or
‘Xeon™ family manufactured by Intel™, the Turion™ family
‘manufactured by AMD™, orauy of various processors man
‘actured by Sun Microsystems, The memory may inelude one
‘or more storage devices configured to store information used
by processor to peeform certain functions related to disclosed
‘embodiments. The storage may inchude a volatile of non-
volatile, magnetic, semiconductor, tape, optical, removable,
‘on-removable, or other type of computer-eadable medinss
ted asa storage deviee.
Tn one embodiment, memory may include one of sore
rogeams or sbprograns that may’be loaded from storage oFUS 9,218,334 B2
u
‘elsewhere that, when executed by the processor, perform
‘various procedures, operations, or processes consistea! with
disclosed embodiments.
‘While the above disclosure as referred specially to the
pronounceability of domain names, the disclosed systems
fnd methods may also he operable for generating other pro-
rounceable words or character stings, for example, email
addresses, gamertags, online identities, company or store
ames, et. Also, the above disclosure isnot limited to chat
acter strings of a specitic length, but may be adapted as
rocestary to accommodate different Iengihs of character
strings. Additionally, while the above disclosure refers 10
pronounceability in the Fnglish language, appropriate maxi
cations may be made to accommodate other languages with-
‘out departing from the spirit and scope of the invention,
Funher, while the source ofthe data has been described
‘embodiments asbeing a dietionary, other data sources may be
twsed for oblaining and generating a list of pronounceable
words. For example, domain names that individuals have
tried orepsteror Web addresses that are Frequently typed but
‘do not correspond to any registered domain name may be
used to create a control set of pronounceable words
TIG. ilastates an exemplary bloek diagram of a com=
ping system 400 which ean be implemented to preform the
‘aris processes of FIGS. 1-3 according o various embod
ments, While FIG. 4 illustrates various components of eo
Puting system 400, one skilled in the art will realize that
‘existing componenls can be removed or additional compo-
nents can be ade.
Asshown inFIG. 4, computing system 400 can includeone
‘or more processors, sch as processor 402 that provide an
‘execution platform for embodiments of seeurity tool 102.
‘Commands and data from processor 402 are communicated
‘over communication bus 404. Computing system 400 ea
also include main memory 406, for example, one or more
‘computer readable storage media such as a Random Access
Memory (RAM), where security tool 102, and/orother appli
‘ation programs, such as an operating system (OS) ean be
‘execute curing rintime, and can inclide secondary memory
408, Secondary memory 408 can include, for example, one or
‘more compiter readable storage media oF devices sich as
hard disk drive 410 andior removable storage dive 412, rep-
resenting a foppy diskette drive, a magnetic tape drive, 2
‘compet disk dive, ete, where a copy of an application
program embodiment for security tool 102 can be stone
Removable storage dive 412 roids fom andor writes 10
removable storage unit 414 in a well-known manner. The
‘computing system 400 can also inehude a newwork interface
416 in oner to conneet withthe one or more nesworks 110,
Inembodiments, a user can interface with computing sys>
tem 400 and operate security tool 102 with keyboard 418,
mouse 420, and display 422. To provide information from
‘computing system 400 and data from security tool 102, the
‘computing system 400 can ince display adapter 424. Dis:
play adapter 424 can interface with communication bus 404
nd display 422. Display adapter 424 ean receive display data
trom processor 402 and convert the display data ito display
‘commands for display 422,
Certain embodiments may be performed as a computer
pplication or program. The compoter program may existin&
‘arity of forms both aetive and inactive, For example, the
‘compisler program can exist as software program(s) com=
prised of program instnitions in source ene, abject code,
‘executable code or other formats, finnware program(s), oF
hardware description language (HDL) files. Any ofthe above
‘can be embodied on a computer readable medium, which
include computer readable storaze devices and media, and
0
o
12
signals, in compressed of uncompressed form, Exemplary
computer eadable storage devices and media include eon-
ventional computer system RAM (random aecess memory),
ROM (read-only memory), EPROM (erasable, program-
able ROM), EEPROM (electrically erisable, program-
‘able ROM), and magsctic or optical disks or tapes. Fxem-
plary computer readable signals, whether modulsted using &
carrer or nt, are signals that a computer system hosting or
‘inning the present teachings can be configured t0 aocess,
including signals downloaded through the lntemet or other
networks, Concrete examples of the foregoing include distr-
bution of executable software program(s) of the computer
program ona CD-ROM or via Internet download, Ina sense,
the Intemet itself san abstract entity isa computer readable
smeium. The same is trie of computer networks in genera
FIG. 8A illustrates an exemplary use interface secording
‘to embodiments for the disclosure. While FIG. 4 illustrates
various aspeets, fields, of attributes of the exemplary user
interlace, one skied inthe art will realize that any of the
aspects, fields, or attributes can be removed, additional
aspects. feds, or attributes can be added, and the onder ofthe
illustrated aspects, feds, or atributes can be changed
‘The user interface $08 can be any of user interface that
allows the user to enter, view, an! interact with pronownee-
ability service in relation to processes discussed in elation to
FIGS. 1-3 The pronounceability service ean prove the user
interface 50S 10 the user vin an output device, such a a
splay. Field 505 isa field that allows the user to enter one oF
more keywords, sueh as domain names, (0 be analyzed
‘according tothe processes of FIGS. 1-3, Field $10 is field
that allows the user To enter one oF more characters thatthe
‘one or more keywords can begin with. Field S18 is fick that
allows the wserto enter one or more characters thatthe one oF
‘ore keywords can end with, Field S20 ia field that allows
the user to view previously entered keywords, Field $25 is a
field that allows the user to view result aed on the entered
keyword and the processes of FIGS. 1.3.
Inthe above examples WEKA and ARF are used tollus-
‘eute various implementations in which aspects ofthe present
Aisclosure can be performed. For example, WEKA is one of
variety of programs that can be used for modeling, which uses
AREF as the file Format o interact wth WER In general,
AREF isa Tomat useful for characterizing a feature Veclor
‘hat can bo used to tain a machine learning model. However,
ferent applications may leverage different formats to rep:
resent featire vectors
‘While the teachings has heen deseribed with reference to
the exemplary embodiments thereof, those skilled inthe at
will be able to make various medications to the deseribed
embodiments without departing from the true spirit and
scope. The tems and descriptions used herein are sel forth by
‘way of illustration only and are not meant as limitations in
particular, although the method has been described. by
examples, the steps of the method may be performed in @
‘ifferent order than illustrated or simultaneously. Pure
‘more, 10 the extent that the terms “including”, “includes”,
“having”, “has”, “with or variants thereof are used in either
the detailed description and the claims, such tems are
intended 10 be inclusive in a manner simile w the tem
oomprising.” As used herein, the tem “one or more of” with
espott toa listing of items such as, for example, A and B,
‘means A alone, Balone, or A and B. Those skilled in the art
‘ill recognize that these and other Variations are possible
‘within the spirit and sonpe as defined in the follwing claims
and their equivalents,US 9,218,334 B2
13
‘What is claimed is
1. A computerimplemented method of generating pro-
nounceable domain names, comprising:
providing alist of character strings;
‘elermining a fist probability that a character siting inthe
list of character strings is pronouncenble based on 8
phonetic model;
determining a second probability that a character string in
the lst of eharacter strings is pronounceable based on a
character order model:
‘ering the ist of charaeter stings through a first fier
based on theirs probability to producea ist filtered list
of character strings
Altering thelist of character strings through a second fer
hhased om the second probability to produce a second
filtered Hist of character strings; and
generating, by a processor, list of pronouneeable domain
‘names based on te firs filtered list of character stings
And the second filtered list of character strings,
2, The computer-implemented method of claim 1, further
‘comprising building the phonetic model representing pho-
netic characteristics of pronounceable words
3. The computerimplemeated method oF claim 1, farther
‘comprising building the character order model representing
‘character order characteristics of pronouneeable words,
4. The computer-implemented method of claim 2, wherein
building the phonetic model luther comprises
‘reparinga library of source words comprising pronounce
able wonis and non-pronouneeable words; and
roving the library of source words toa leaning model
algorithm to tran the learning model algorithm to deter-
‘mine characteristics of pronounceable and charactris-
ties of non-pronounceable words
5. The computer-implemented method of clan 4, wherein
the learning mode algorithm comprises a Bayesian network.
‘6. Thecompater-implemented method of claim 4, wherein
preparing the library compeses
‘building an attribute relationship file format (AREP) based
‘on the library’ of source words: and
associating one or more atrbutes of pronounceable words
‘and not-pronounceable words with the AREF
7. The computer implemented method of clam 6, wherein
‘a east one atibute of the one oF more atebutes of pro=
rounceable words and noa-pronounceable words comprises
an negra score
8, The compuierimplemented method of claim 7, farther
‘comprising:
calculating a first n-gram score for the pronounceable
‘word in the library and
calculating second nygram score forthe non-pronounce=
able words inthe library.
9, The computer implemented method of claims 6, wherein
an atribute ofthe one or more atrbutes af pronounceable
words and non-pronounceable words comprise a phonetic
representation.
10. The computer-implemented method of claim 9, further
‘comprising: