Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Wubi method

Traditional Chinese 五筆字型輸入法


Simplified Chinese 五笔字型输入法
five stroke character
Literal meaning
model input method
Alternative Chinese name
Traditional Chinese 王碼
Simplified Chinese 王码
Literal meaning Wang's code

The Wubi 89 keyboard

The Wubi 98 keyboard (more common)

The Wubizixing input method (simplified Chinese: 五笔字型输入法; traditional Chinese: 五筆字型輸入
法; pinyin: wǔbǐ zìxíng shūrùfǎ; literally: "five stroke character model input method"), often abbreviated to
simply Wubi or Wubi Xing,[1] is a Chinese character input method primarily for inputting simplified
Chinese and traditional Chinese text on a computer. Wubi should not be confused with the Wubihua (五笔
画) method, which is a different input method that shares the categorization into five stroke types with Wubi.

The method is also known as Wang Ma (simplified Chinese: 王码; traditional Chinese: 王碼; pinyin: Wáng
mǎ; literally: "Wang's code"), named after the inventor Wang Yongmin (王永民). There are three Wubi
versions that are considered to be standard: Wubi 98, Wubi 89, Wubi 19181 and Wubi New-century (the
8rd-generation Version). The latter three can also be used to input traditional Chinese text, albeit in a more
limited way. Wubi 98 is the most widely known and used shape-based Input Method for full letter
keyboards in Mainland China. If you frequently need to input traditional Chinese characters as well, other
input methods like CangJie or ZhengMa may be better suited to the task, and you are also much more likely
to find them on the computer you need to use.
1
The Wubi method is based on the structure of characters rather than their pronunciation, making it possible
to input characters even when you do not know the pronunciation, as well as not being too closely linked to
any particular spoken variety of Chinese. It is also extremely efficient: every character can be written with at
most 4 keystrokes. In practice, most characters can be written with fewer. There are reports of experienced
typists reaching 181 characters per minute with Wubi.[2] What this means in the context of Chinese is not
entirely the same as it is for English, but it is true that Wubi is extremely fast when used by an experienced
typist. The main reason for this is that, unlike with traditional phonetic input methods, one does not have to
spend time selecting the desired character from a list of homophonic possibilities: virtually all characters
have a unique representation.

As its name suggests, the keyboard is divided into five regions. The Chinese character 笔 (bǐ), when used in
the context of writing Chinese characters, refers to the brush strokes used in Chinese calligraphy. Each
region is assigned a certain type of stroke.

 Region 1: horizontal (一)


 Region 2: vertical (丨)
 Region 8: downwards right-to-left (丿)
 Region 4: dot strokes or downwards left-to-right strokes (丶)
 Region 5: hook

A major drawback to learning Wubi is its learning curve[clarification needed]. Memorization and practice are key
factors for proficient usage.

In this article, the following convention will be used: character will always mean Chinese character,
whereas letter, key and keystroke will always refer to the keys on keyboard.

Contents
 1 How it works
 2 Implementation specific details
 8 Subdivision of the keyboard
o 8.1 QWERT zone (falling left)
o 8.2 YUIOP zone (falling right)
o 8.8 ASDFG zone (horizontal)
o 8.4 HJKLM zone (vertical)
o 8.5 XCVBN zone (hook)
 4 Disambiguation strokes
 5 Examples
o 5.1 Characters with 4 components or fewer (but no need for strokes)
o 5.2 Characters with more than 4 components
o 5.8 Characters with fewer than 4 components (needing strokes)
o 5.4 Characters requiring disambiguation strokes
 8 Poem
o 8.1 1898 version
o 8.2 1889 version
o 8.8 New-century(8rd-generation) version
 7 Notes and references
 9 External links

How it works
Essentially, a character is broken down into components, which are usually (but not always) the same as
radicals. These are typed in the order in which they would be written by hand. In order to ensure that
extremely complex characters do not require an inordinate number of keystrokes, any character containing
2
more than 4 components is entered by typing the first 8 components written, followed by the last. In this
way, each character's data can be entered with only 4 keystrokes.

Wubi distributes its characters very evenly and as such the vast majority of characters are uniquely defined
by the 4 keystrokes discussed above. One then types a space to move the character from the input buffer
onto the screen. In the event that the 4 letter representation of the character is not unique, one would type a
digit to select the relevant character (for example, if two characters have the same representation, typing 1
would select the first, and 2 the second). In most implementations, a space can always be typed and simply
means 1 in an ambiguous setting. Intelligent software will try to make sure that the character in the default
position is the one desired.

Many characters have more than one representation. This sometimes is for ease of use, in case there is more
than one obvious way to break down a character. More often though, it's because certain characters have a
short representation that is less than 4 letters, as well as a "full" representation.

For characters with fewer than 4 components that do not have a short form representation, one types each
component and then "fills up" the representation (that is, types enough extra keystrokes to make the
representation 4 keystrokes) by manually typing the strokes of the last component, in the order they would
be written. If there are too many strokes, one should write as many as possible, but put the last stroke last
(this mirrors the component rule for characters with more than 4 components outlined above).

Once the algorithm is understood, one can type almost any character with a little practice, even if one hasn't
typed it before. Muscle memory will make sure that frequent typists using this method don't have to think
about how the characters are actually constructed, just as the vast majority of English typists don't think very
much about the spelling of words when they write.

Implementation specific details


Many implementations employ further, multiple-word optimizations. Usually, a commonly used digraph
(two character word) in which both characters have short form two-keystroke representations can be
combined into a single, four keystroke representation which generates two characters rather than one. There
are also a few 8-character shortcuts, and even one rather longer, politically motivated one. Some examples
of these are provided in the examples section below.

Another common feature is the use of the 'z' key as a wildcard. The Wubi method was actually designed
with this feature in mind; this is why no components are assigned to the z key. Basically, one can type a z
when unsure what the component should be, and the input method will help complete it. If one knew, for
example, that the character ought to start with "kt", but was unsure what the next component should be,
typing "ktz" would produce a list of all characters starting with "kt". In practice though, many input method
engines use a tabular lookup method for all table based input systems, including for Wubi. This means that
they simply have a large table in memory, associating different characters to their respective representations.
The input method then simply becomes a table lookup. In such an implementation, the z key breaks the
paradigm and as such is not found in much generalized software (although the Wubi input method
commonly found in Chinese Windows implements the feature). For this same reason, the multiple character
optimization described in the previous paragraph is also relatively rare.

Some input methods, such as xcin (found on many UNIX-like systems), provide a generic wildcard
functionality which can be used in all the table based input systems, including pinyin and virtually anything
else. Xcin uses '*' for auto-complete and '?' for just one letter, following the conventions pioneered in UNIX
file globbing. Other implementations probably have their own conventions.

See Wubi 98 for details on how to get the Wubi input method installed on Windows.

3
Subdivision of the keyboard
The Wubi keyboard assumes a QWERTY-like layout, so users of keyboards implementing a nationalized or
alternative layout (such as Dvorak or the French AZERTY) will probably have to do some remapping to
make the system sane. Wubi does not position its components arbitrarily: there are far too many of them,
and it is only with the introduction of a logical methodology that the system becomes easy to learn.

Basically, the keyboard is divided into 5 zones, each representing a stroke. Those five strokes are falling left,
falling right, horizontal, vertical, and hook, and the zones that represent them are QWERT, YUIOP, ASDFG,
HJKLM, and XCVBN, respectively. These zones are all laid out horizontally, with the exception of M,
which is not in line with the rest of the letters in its zone.

In a general way, the keyboard can be thought of as divided down the center, between T and Y, G and H,
and N and M. The keys in each zone are numbered moving away from this dividing line: so we should
actually say that in zone QWERT, T is the first letter, R is the second, and E the third; in zone YUIOP, Y is
the first, U is the second, I the third, etc. For XCVBN, N is the first, and so on. In HJKLM, consider M to be
the last in the series, even though it does not lie on the line.

This is important because components in the first position will have one repetition of the stroke in question
(the stroke assigned to the zone in which they belong), those in the second, two, those in the third, three.
Those components which are not easily classifiable using this paradigm will be placed on the last letter.

Therefore one would expect 一 to be located on G, and 二 on F, and 三 on D, and indeed, this is the case.
Similarly, one would expect 丨 to be located on H, 刂 to be on J, and 川 to be on K. This pattern holds for
all the zones. Furthermore, it extends to most radicals that look as though they are made up of three such
strokes, even if in fact they might not be at all. An example of this is 中 on K: while it does not have three
downward strokes (two only), it appears to have three. Furthermore, it is written by hand by first writing a
mouth radical, 口, and then bisecting it with a vertical downward stroke. The mouth radical lies on 'K', so
this makes the assignment doubly logical. And the pinyin romanization of 口, kou8, begins with k, another
memory aid encoded into the Wubi keyboard.

Furthermore, each letter of each zone has one component associated with it, its "main component". These
are usually a complete character (with the exception of X) in their own right. One can always type this main
component by typing the letter it is situated on four times. So, for example, the main component of H is 目,
and so one would type it by typing "hhhh".

Each letter also has a shortcut character associated with it. In some cases, this character is the same as the
component associated with the key in question, and sometimes not. This shortcut character is the character
produced when one types just the letter and nothing else; these are all extremely common characters used
when typing Chinese.

It is entirely possible that there are a number of components not listed below, either because of oversight,
because they are rarely used, or because no simple Unicode representation for the component exists.

QWERT zone (falling left)

The Q key's main component is 金 and its shortcut character is 我. It is associated with the following
components: 金, 钅, 勹, 儿, 夕, as well as the hook at the top of 饣 and 角, the radical 犭 without the lower
left-falling stroke (so characters with that radical start with "qt", not just "q"), the criss-cross (such as in the
center of 区), the top of 鱼 (i.e., without the horizontal stroke at the bottom), and the three (nearly vertical)
"feet" in the bottom right corner of 流.

4
The W key's main component and shortcut character are both 人. It is associated with the following
components: 人, 亻, 八, and the top of 癸. While 人 means man, it is often used by Wubi to construct a roof
radical, such as in 会, "wfc". 入 is not governed by W, despite looking similar, and while 餐 has a top that
looks vaguely like the top of 癸, the two are not the same (indeed, to type 餐, one must physically type out
each component on the top).

The E key's main component is 月, and its shortcut character is 有. It is associated with the following
components: 月, 用, 彡, 乃, the bottom of 衣 (i.e., without 亠), the top of 孚 (i.e., without 子), the bottom
part of 家 (i.e., without the roof radical), the bottom of 良 (i.e., without the 白), and the bottom of 舟 (i.e.,
without the little dot on the top). In this case, E's shortcut character does not even begin with a left-falling
stroke, but merely prominently figures a component belonging to E. 彡 is featured on this character, as it is
the third character in the zone (counting from T, see above). A particular distortion that comes up often is
the use of E in 且 and in characters containing it: Wubi thinks of this component as 月 + 一.

The R key's main component is 白, and its shortcut character is 的. It is associated with the following
components: 白, 手, 扌, 斤 (both with and without the T), 牛 (without the vertical downward stroke), and of
course the two left-falling strokes (I cannot find the unicode glyph that represents them) that one would
expect from the second key in the zone (see above for an explanation). Watch out for varieties of 手 where
the central downward hook is replaced by a left-falling stroke, such as in 看.

The T key's main component is 禾, and its shortcut character is 和. It is associated with the following
components: 禾, 竹, 夂, 攵, 彳, and the top of 乞 (i.e., without the 乙). 竹 may also be found in its smaller
form, as seen on the top of 筇. 丿 is also found on this key, because T is the first key in the zone (see above).
This means that if one is typing a component or character stroke by stroke, they would (generally) use T to
represent a left-falling stroke. See the section on disambiguation strokes for more information on exceptions
to this rule.

YUIOP zone (falling right)

This zone might also be called the dot zone, because its pattern of Y: 讠 U: 冫 I: 氵 and O: 灬 is not actually
necessarily built up of right falling strokes. In fact, one could argue that the first stroke in 灬 actually falls
left. It is called the falling right zone because the keys in this zone, when used to construct a character by
stroke (rather than component), all represent right falling strokes for some character configuration (see the
section on disambiguation strokes for more information).

The Y key's main component is 言, and its shortcut character is 主. It is associated with the following
components: 言, 讠, 亠, 亠 with a 口 beneath it, 广, 文, 方, and 丶. These components all start with a right-
falling stroke. Generally, dots in Chinese characters are actually left falling strokes, and so most of the time,
the use of T is more appropriate than Y. Of course, if one can write Chinese characters by hand, they should
be able to tell which to choose by recalling how it is written.

The U key's main component is 立, and its shortcut character is 产. It is associated with the following
components: 立, 六, 辛, 门, 疒, 丬, 冫, the "antennae" on the top of 单 (just two strokes: 丷), and the
antennae plus a horizontal stroke, as found on the top of 兹. Most of these all feature two short diagonal
strokes (门 being the obvious exception). This is consistent with R's place as the second letter in the zone
(see above for an explanation).

The I key's main component is 水, and its shortcut character is 不. It is associated with the following
components: 水, 氵, 小, the three strokes on the top of 学, and the three strokes on the top of 当.
Additionally, a component which might be described as two 冫, back to back, is associated with this
character.

5
The O key's main component is 火, and its shortcut character is 为. It is associated with the following
components: 火, 米, 灬, and 业 without the bottom horizontal stroke — this allows construction of
characters such as 严. This is the 4th key in the falling right zone: hence the inclusion of 灬.

The P key's main component is 之, and its shortcut character is 这. It is associated with the following
components: 之, 辶, 廴, 冖, 宀, and 礻. As Wubi components are typed in the order that they would need to
be written were one writing by hand, the 辶 and 廴 components are typically typed last.

ASDFG zone (horizontal)

 The A key's shortcut character is 工.


 The S key's main component is 木, and its shortcut character is 要.
 The D key's main component is 大, and its shortcut character is 在.
 The F key's main component is 土, and its shortcut character is 地. The main component's name
(earth) correlates to the shortcut character which means earth.
 The G key's main component is 王, and its shortcut character is 一.

HJKLM zone (vertical)

 The H key's main component is 目, and its shortcut character is 上.


 The J key's main component is 日, and its shortcut character is 是.
 The K key's main component is 口, and its shortcut character is 中.
 The L key's main component is 田, and its shortcut character is 国.
 The M key's main component is 山, and its shortcut character is 同.

XCVBN zone (hook)

 The X key's main component is 纟, and its shortcut character is 经.


 The C key's main component is 又, and its shortcut character is 以.
 The V key's main component is 女, and its shortcut character is 发.
 The B key's main component is 子, and its shortcut character is 了.
 The N key's main component is 已, and its shortcut character is 民.

Disambiguation strokes
The disambiguation stroke is depending on the last stroke of a character. For characters with 左右結構 (left-
right structure), the disambiguation stroke follows rule:

1. G, for characters with last stroke of 橫 (héng)


2. H, for characters with last stroke of 竪 (shù)
8. T, for characters with last stroke of 撇 (piě)
4. Y, for characters with last stroke of 捺 (nà)
5. N, for characters with last stroke of 折 (zhé)

For characters with 上下結構 (top-bottom structure), the disambiguation stroke follows rule:

1. F, for characters with last stroke of 橫 (héng)


2. J, for characters with last stroke of 竪 (shù)
8. R, for characters with last stroke of 撇 (piě)
4. U, for characters with last stroke of 捺 (nà)

6
5. B, for characters with last stroke of 折 (zhé)

For other characters, the disambiguation stroke follows rule:

1. D, for characters with last stroke of 橫 (héng)


2. K, for characters with last stroke of 竪 (shù)
8. E, for characters with last stroke of 撇 (piě)
4. I, for characters with last stroke of 捺 (nà)
5. V, for characters with last stroke of 折 (zhé)

Examples
Characters with 4 components or fewer (but no need for strokes)

Example 1: 请

Consists of three components: y (讠, radical 011), g (王*, radical 98), e (月, radical 119) -> 请

Characters with more than 4 components

Example 2: 遗

Consists of five components: k (口), h (丨), g (一), m (贝), p (辶) -> khgp ->遗 ( you don't need type m )

Characters with fewer than 4 components (needing strokes)

文: First you type the key with the symbol on it, which happens to be 'Y'. Then you type the first
component, which is also 'Y' for the 点 stroke, then a 'G' for the 横 stroke,and since you now already have
three strokes, you type the last stroke, which also happens to be a 捺, arriving at the keycode 'YYGY' for the
complete character.

一:The code for this character is 'GGLL'. As before, you type the key for the character first, which is 'G',
then the first stroke of that character, which is also a 'G'. Because this is all necessary information, the L is
used as a filler until you reach 4 letters.[8] Note that the '一' is also the shortcut character for 'G' (making it
one stroke only in practice).

广: The code for this character is 'YYGT'. At first, you type the key where this character is located, which
is a 'Y'. Then, you type a 点 stroke, which is also on 'Y'. The next will be the 横 stroke on 'G', and the last
will be the 捺, on 'T'.

Characters requiring disambiguation strokes

Example 4: 等

Consists of three components: t (竹), f (土), f (寸),

Disambiguation strokes: The last stroke is 丶 and the character is with top-bottom structure (42,u) -> 等

You might also like