Professional Documents
Culture Documents
Mass Mahjong Decision System Based On Transfer Learning: Yajun Zheng Shuqin LI
Mass Mahjong Decision System Based On Transfer Learning: Yajun Zheng Shuqin LI
Mass Mahjong Decision System Based On Transfer Learning: Yajun Zheng Shuqin LI
228
ICIAI 2022, March 04–06, 2022, Guangzhou, China Yajun Zheng and Shuqin Li
Kong: There are 3 forms of Kong, divided into Open Kong, Closed situation as one player winning and the others continuing
Kong and Repaired Kong. Open Kong is when a tile discarded by to play.
another player is the same as the three tiles in your hand. When In conclusion, Mass Mahjong is simpler in rules compared to
you draw a tile that is the same as three tiles in your hand, you can Bloody Mahjong and can be seen as a stage in the Bloody Mahjong,
take Closed Kong action. If you draw a tile that is the same as a tile thus providing good conditions for using transfer learning.
you have already Pong, you can take Repaired Kong action.
Listen: When the hand is one tile short of Win, the hand is then 3.2 Overall design concept of the Mass
in a waiting state. The player can choose whether or not Listen.
Mahjong decision system
After Listen, although you can get the score reward directly, but the
hand cannot be changed after that, i.e. whatever tile is subsequently The overall concept of the transfer learning Mass Mahjong Discard
drawn is discarded until the winning tile in the game appears, then model design is shown in Figure 1. It consists of the following four
you can take Win action. main components.
Meld: During the game, the combinations of tiles for Eat, Pong, 1. Pre-training. Generally, the model that can get better training
Open Kong and Repaired Kong actions. on a similar task is selected for pre-training. According to the
Win: The player’s hand forms a special combination. characteristics of Mass Mahjong, this paper selects the Bloody
At the beginning of the game, four players seat in the south, east, Mahjong Discard model, which is similar to Mass Mahjong, as the
north, west, and each player draws 13 tiles in his hand, while the source model and pre-trains it on the Bloody Mahjong dataset.
dealer player draws one more tile in his hand, making a total of 14 • Weight transfer. This step mainly uses the trained model
tiles. After that, the player in the dealer’s seat starts to discard in a weights and then uses these weights as initial weights in a
clockwise direction, and during the game, he can Draw, Discard, Eat, new task, which simply means that the parameters of the
Pong, Kong, Listen and Win. Among them, Eat, Pong, Kong, Listen trained model are transferred to the new model to help train
and Win are all directly benefits. And multiple decision actions the new model. In this paper, we transfer the weights of the
may occur at the same moment. The priority of each action is: Win pre-trained Bloody Mahjong Discard model.
> Kong > Pong > Eat. When a player wins or the wall of tiles are • Re-training. According to the characteristics of the current
empty, the game comes to an end and the total score of each player task, some structures of the original model are fixed or
is counted based on the score of the winning hand and the score of changed to adapt the model to the new task. In this paper, we
the action in the game. first remove some unique features of Bloody Mahjong from
the model input, such as changing three tiles and missing suit,
3 OVERALL DESIGN OF THE MASS and change the input structure of the model, then use data
augmentation, after augmenting less Mass Mahjong data,
MAHJONG DECISION SYSTEM
and use the transferred weights to retrain Mass Mahjong
Due to the rules of Mass Mahjong, Eat, Pong, Kong and Listen to build a transfer learning based Mass Mahjong Discard
operations can be directly rewarded. Therefore, this part can be model.
implemented based on knowledge rules. And the quality of the • Model optimization. After training the Mass Mahjong Dis-
Discard determines whether the subsequent win and gain. So the card model, the model is optimized by fine-tuning. In this
research in this paper focuses on the Mahjong Discard decision. paper, other Mahjong decision models are integrated with
In previous work [6], the Mahjong Discard decision model was the Mahjong Discard model to form the Mahjong decision
trained and performed well. Therefore, this paper uses the Bloody system. According to the rules of Mass Mahjong, a Mass
Mahjong decision model to perform transfer learning. Mahjong judge system is established to self-play with the
Mass Mahjong decision system, and the data generated by
3.1 The similarities and differences between the self-play is used to update the Mass Mahjong Discard
the rules of Mass Mahjong and Bloody model to achieve the effect of model fine-tuning and opti-
Mahjong mization.
Mass Mahjong and Bloody Mahjong are similar, the overall rules
4 DESIGN AND IMPLEMENTATION OF MASS
are similar but there are some differences, the main similarities and
differences between the two are as follows. MAHJONG DECISION SYSTEM
• Similarities: Both are 108 tiles, with only three suits: Dot,
4.1 Design and implementation of Discard
Bamboo and Character. The rules are the same for both the model for Mass Mahjong
Pong and Kong actions. The rules for winning are also similar, In previous work, due to a large amount of Bloody Mahjong Dis-
as long as the particular tile fits the winning rules. card data, a Bloody Mahjong Discard model is constructed for
• Differences: Mass Mahjong does not have the rule of chang- pre-training as a source model for transfer learning. For the model
ing three tiles and there is no requirement for a missing input data representation reference to the method proposed in the
suit to win. However, two actions have been added to Mass literature [8-10], the game scenario was segmented considering the
Mahjong: Eat and Listen. The Eat, Pong, Kong and Listen completeness and correlation between the data, and the known
actions of Mass Mahjong are all directly rewarded. The game information in the current situation was extracted using human
comes to an end when one player wins, and there is no such experience based knowledge. Since there are no features in Mass
229
Mass Mahjong Decision System Based on Transfer Learning ICIAI 2022, March 04–06, 2022, Guangzhou, China
230
ICIAI 2022, March 04–06, 2022, Guangzhou, China Yajun Zheng and Shuqin Li
parameters after weight transfer. The specific algorithm is shown Algorithm 2 Self-play model optimization algorithm
in Algorithm 1 below. input model_Mass, model_Other, epochs
output model_Mass
epoch = 0
Algorithm 1 Mass Mahjong Discard model algorithm based on best_error = 1
transfer learning while epoch < epochs do
data = Self_Play(model_Mass, model_Other)
input data_Bloody, data_Mass
data_win = Select(data)
output model_Mass
data_train, data_dev, data_test = Split_Data(data_win)
model_Bloody = Train(data_Bloody) // Training Bloody Mahjong
model_new = Train(data_train, data_dev)
Discard model
test_error = Compute_Error(data_test, model_new)
data_Mass_ aug= Augmentation(data_Mass) // Mass Mahjong
if test_error < best_error then
data augmentation
best_error ← test_error
model_Mass = Train (data_Mass_ aug) // Training Mass Mahjong
model_Mass ← model_new
Discard model using Bloody Mahjong Discard model parameters
Save(model_Mass)
return model_Mass
epoch ← epoch + 1
return model_Mass
231
Mass Mahjong Decision System Based on Transfer Learning ICIAI 2022, March 04–06, 2022, Guangzhou, China
and judging whether the next decision is legal. The overall design
is shown in Figure 5 below.
In this paper, we use self-playing games to produce data, and
then use the data of the winning players to train the model for
optimization. Ten processes were used for training, and one training
session was conducted 250 times per game for 40 rounds each with
a learning rate of 0.001. While using JJWorld’s testing software(The
National Competition‘s testing software with official support), the
games were tested for 30 rounds with four players randomly seated.
The unoptimized AI were players with the names TEST 2 and TEST
4, and the optimized AI were TEST 1 and TEST 3. TEST 3 trained
longer than TEST 1, and the final results are shown in Table 4
Figure 4: Transfer learning experiment results As can be seen in Table 4, the unoptimized model won fewer
times and scored lower than the optimized model. The optimized
model, on the other hand, wins more times and score higher, while
the longer the model is trained, the better the model is optimized
To improve the generalization ability of the model, data augmen- and the more the model can learn the rules of Mass Mahjong.
tation is carried out on the processed data. The three suits of tiles,
Dot, Bamboo and Character, have the same rank and the numbers 5.3 Results of the practical competition
one to nine are symmetrical in structure. Referring to AlphaZero’s The Mass Mahjong decision system designed in this paper was
data augmentation method, the three suits of tiles can be rotated entered into the 2021 National University Computer Gaming Com-
and the numbers one to nine are symmetrically processed, with a petition. The Mass Mahjong agent designed in this paper won the
total of 48 situations, i.e. there are 47 other positions in a situation second prize in the competition. When playing against each other,
that are equivalent to the current one. some of the results are shown in Table 5 below.
Among them, BISTU-Mahjong1 was the runner-up team last
5.2 Transfer learning model experiment year, based on the deep learning model, and BISTU-Mahjong2 was
After data processing, these calibration data are divided, and a the participating team last year, basing on rules, which won the
total of 900,000 calibration data are obtained. A randomly selected 5th place last year. The Mass Mahjong decision system based on
80% of the data was used as the training set and 20% as the test transfer learning proposed in this paper performs better than the
set. The model was written with pytorch, using parameters after above two, indicating that the algorithm proposed in this paper has
weight transfer, and had been trained on a single GPU in the type a certain effect.
GEFORCE RTX 2080. As the number of training increases, the test When playing against other players’ AI, the model performed
set obtained is shown in Figure 4 average, and the reasons for this were analyzed as follows: On the
Although the accuracy of 84% was achieved on the test set. How- one hand, there is not enough time for model training; on the other
ever, due to the low quality of data, the model needs to be optimized hand, the models of Eat, Pong and Kong are implemented based
and improved by subsequent self-playing games. on rules, and they are not well coordinated with the Discard mod-
In order to provide conditions for the subsequent self-play, els. Finally, the Mass Mahjong decision system based on transfer
this paper constructs the Mass Mahjong judge system. The Mass learning proposed in this paper won the second prize.
Mahjong judge system can simulate the environment of a Mass
Mahjong game and is responsible for interacting with the Mass 6 CONCLUSION
Mahjong decision system, which form a complete Mass Mahjong In this paper, in order to solve the problem of differences in the
game together. The Mass Mahjong judge system mainly includes rules of adjacent Mahjong domains and the simultaneous absence
functions such as drawing tiles, judging winning, calculating scores, of data, transfer learning is used to remove features specific to the
232
ICIAI 2022, March 04–06, 2022, Guangzhou, China Yajun Zheng and Shuqin Li
Table 5: Comparison Results and Technology research program (NO. KM201911232002), by Con-
struction Project of computer technology specialty (NO.5112011019)
Team Name Total Score , and by Normal projects of promoting graduated education pro-
gram at Beijing Information Science and Technology University
DIHU 170
(NO.5112011041) .
BISTU-Mahjong1 46
BISTU-Mahjong2 -105 REFERENCES
Example AI -111 [1] Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep
neural networks and tree search[J]. nature, 2016, 529(7587): 484-489.
[2] Zha D, Xie J, Ma W, et al. DouZero: Mastering DouDizhu with Self-Play Deep
transferred source model, and then the model is migrated to a new Reinforcement Learning[J]. arXiv preprint arXiv:2106.06135, 2021.
[3] Silver D, Hubert T, Schrittwieser J, et al. Mastering chess and shogi by self-
Mahjong, and finally the model is further optimized by building play with a general reinforcement learning algorithm[J]. arXiv preprint arXiv:
a self-play system. In this paper, the Bloody Mahjong model was 1712.01815, 2017.
[4] Van der Kleij A A J. Monte Carlo tree search and opponent modeling through
transferred to the Mass Mahjong model, which eventually won the player clustering in no-limit Texas hold’em poker[J]. University of Groningen,
second prize in the 2021 National University Computer Gaming The Netherlands, 2010.
Mass Mahjong Competition. Since some of the models used in this [5] Li J, Koyamada S, Ye Q, et al. Suphx: Mastering Mahjong with Deep Reinforcement
Learning[J]. arXiv preprint arXiv:2003.13590, 2020.
paper are rule-based models, in the future work, it is possible that [6] Gao S, Li S. Bloody Mahjong playing strategy based on the integration of deep
for the Eat, Pong, Kong and Listen models, deep learning models learning and XGBoost[J]. CAAI Transactions on Intelligence Technology, 2021.
can be trained to replace the current rule-based models, so that [7] Qingyue Wang. Game of Mahjong [M]. Chengdu: Shurong Chess Publishing
House.2003.
each model can have a good coordination effect, improve the overall [8] Gao S, Okuya F, Kawahara Y, et al. Supervised Learning of Imperfect Information
decision-making ability of the model. Data in the Game of Mahjong via Deep Convolutional Neural Networks[J].
Information Processing Society of Japan, 2018.
[9] Gao S, Okuya F, Kawahara Y, et al. Building a Computer Mahjong Player via
ACKNOWLEDGMENTS Deep Convolutional Neural Networks[J]. arXiv preprint arXiv:1906.02146, 2019.
[10] Wang M, Yan T, Luo M, et al. A novel deep residual network-based incomplete
This work is supported by Normal projects of promoting graduated information competition strategy for four-players Mahjong games[J]. Multimedia
education program at Beijing Information Science and Technology Tools and Applications, 2019: 1-25.
[11] Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional
University. (NO. 5212010937), by Normal projects of General Science networks[C]//Proceedings of the IEEE conference on computer vision and pattern
recognition. 2017: 4700-4708.
233