Journal Pre-Proof: Neurocomputing

Point Encoder GAN: A Deep Learning Model for 3D Point Cloud Inpainting Communicated by Dr. H.
Yu
Journal Pre-proof
Point Encoder GAN: A Deep Learning Model for 3D Point Cloud

Inpainting
Yikuan Yu, Zitian Huang, Fei Li, Haodong Zhang, Xinyi Le
PII: S0925-2312(19)31735-7
DOI: https://doi.org/10.1016/j.neucom.2019.12.032
Reference: NEUCOM 21661
To appear in: Neurocomputing
Received date: 30 June 2019

Revised date: 1 December 2019
Accepted date: 6 December 2019
Please cite this article as: Yikuan Yu, Zitian Huang, Fei Li, Haodong Zhang, Xinyi Le, Point En-
coder GAN: A Deep Learning Model for 3D Point Cloud Inpainting, Neurocomputing (2019), doi:
https://doi.org/10.1016/j.neucom.2019.12.032
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.
© 2019 Elsevier B.V. All rights reserved.

Point Encoder GAN: A Deep Learning Model for 3D Point Cloud Inpainting✩
Yikuan Yua,b , Zitian Huanga,b , Fei Lic,d , Haodong Zhanga,b , Xinyi Lea,b,∗
a School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 20040, China
b Shanghai Key Laboratory of Advanced Manufacturing Environment,Shanghai 200240, China
c State Key Laboratory of Intelligent Manufacturing System Technology, Beijing Institute of Electronic System Engineering, Beijing 100854, China
d Beijing Complex Product Advanced Manufacturing Research Center, Beijing Simulation Center, Beijing 100854, China
Abstract
In this paper, we propose a Point Encoder GAN for 3D point cloud inpainting. Different from other 3D object inpainting networks,
our network can process point cloud data directly without any labeling and assumption. We use a max-pooling layer to solve the
unordered of point cloud during the learning procedure. We add two T -Nets (from PointNet) to the encoder-decoder pipeline, which
can yield better feature representation of the input point cloud and a more suitable rotation angle of the output point cloud. We then
propose a hybrid reconstruction loss function to measure the difference between the two sets of unordered data. Using small sample
models on ModelNet40 only, the proposed Point Encoder GAN yields end-to-end inpainting results surprisingly. Experiment results
have shown a high success rate. Several technical measures are used to identify the good qualities of our generated models.
Keywords: Point cloud, neural network, inpainting, encoder, generative adversarial nets(GANs)
1. Introduction cloud data. Quite a few previous works transform the 3D point
cloud to regular 3D voxel data [17–21]. But voxelization often
Nowadays, 3D laser scanners or photo scanners are fre- results in information reduction.
quently used for point cloud acquisition [1]. This data struc- In this paper, Point Encoder GAN, an original network struc-
ture is widely applied for engineering design [2], geographical ture, is proposed based on PointNet and GAN. This network
mapping [3], and scene recognition [4, 5]. However, due to the directly takes defective point cloud as input and generates the
limitation of instrument precision, point cloud sets are defective missing part. Our contributions are as follows:
and incomplete most of the time. The extra information miss-
ing hinders the use of point clouds, which leads to an urgent • The network is able to process raw defective point cloud
need for point cloud inpainting. Neural networks have great in- data without voxelization, which prevents the extra in-
formation processing ability [6] and help us complete this task. formation loss compared with the voxel-based methods
During the past few years, some inpainting methods are pro- [12–14].
posed and proved to be effective for image and 3D object pro-
cessing. For instance, inpainting methods based on CNN (Con- • The proposed Point Encoder GAN is trained on a small
volutional Neural Network) and GAN (Generative Adversarial data set of randomly corrupted point clouds constructed
Network) have achieved great performance for 2D images [7– from ModelNet40 [22, 23], which performs the great gen-
11] and 3D objects [12–16]. Effective as they are, these meth- eralization capability.
ods for 3D object inpainting still require 3D voxel data rather • The Point Encoder GAN needs no structure and classi-
than the first-hand point cloud data. As shown in Figure 2(c),
fication information above the objects such as symme-
different from previous works in Figure 2(a) and 2(b), inpaint- try and category. It is an authentic end-to-end model for
ing on point cloud set directly is our main target in this paper. point cloud inpainting.
As point clouds are irregularly defined in Euclidean Space,
it is difficult to feed typical convolutional architectures on point
✩ The work described in the paper was jointly sponsored by Startup Fund
for Youngman Research at SJTU (SFYR at SJTU), Open Fund of State Key
Laboratory of Intelligent Manufacturing System Technology, Natural Science
Foundation of Shanghai (18ZR1420100) and National Natural Science Foun-
dation of China (61703274).
∗ Corresponding author
Email addresses: yuyikuansjtu@163.com (Yikuan Yu),

Zitian.huang@sjtu.edu.cn (Zitian Huang),
lifei.zju@gmail.com (Fei Li), zhanghaodong@sjtu.edu.cn Figure 1: An example of point cloud inpainting for an airplane under two views.
(Haodong Zhang), lexinyi@sjtu.edu.cn (Xinyi Le)
Preprint submitted to Elsevier December 18, 2019

theory problem. If we want to obtain a GAN with great perfor-
mance, an appropriate learning rate should be selected.
GAN can be applied in multiple areas [26]. Some researchers
use GAN for natural language processing [27–30] and achieve
great performance. Image generation methods based on GAN
make a huge boost to computer vision [7, 14, 31–33]. Using
GAN to process medical images is another research hotspot
[34]. GAN can also be applied for anomaly detection [35]. One
of the most important applications of GAN in computer vision
is object generation including images and 3D point cloud. L-
GAN [36] introduces the first deep-learning-based network for
point cloud completion by utilizing an Encoder-Decoder frame-
work. [37] provides the point cloud generation method by graph
convolution. GAN can be used to generate 3D point cloud from
a single 2D RGB image [38]. This work proves the feasibility
and effectiveness of 3D point cloud generation by GAN. As de-
scribed below, GAN can assist inpainting model training by its
alternant performance increasing.
2.2. Inpainting Model

Inpainting is a hot spot in computer vision. Both 2D im-
age inpainting and 3D object inpainting can be implemented by
deep learning methods.
Image Inpainting In recent years, deep learning methods

achieve better results than traditional algorithms. Pathak [7]
Figure 2: Different input data and representive inpainting networks. (a) Image proposed context encoders combining with GAN to predict the
inpainting by Context Encoders. (b) 3D object inpainting using voxel data by missing parts from their surroundings. Yang [8] enhanced the
3D-Rec GAN. (c) 3D object inpainting using point cloud data by Point Encoder performance of the encoder with the texture network. Gao [9]
GAN.
applied a fully connected channel to improve the network. Sim-
ilar to the image inpainting, the encoder-based network can also
Figure 1 is an inpainting example of an airplane via Point be utilized to solve 3D object inpainting problems.
Encoder GAN. Given a point cloud set with missing points, our 3D Object Inpainting Wang [14] used a hybrid frame-
inpainting algorithm can generate points with the same amount work combining a 3D Encoder-Decoder with GAN to rebuild
and combine the output with the defective point cloud. the missing 3D voxel data in low-resolution. Meanwhile, they
The paper is organized as follows. In Section 2, some re- applied a long-term recurrent convolutional network to mini-
lated work of our model is introduced including GAN, inpaint- mize GPU memory usage and transformed the 3D model into
ing models and point cloud learning properties. Section 3 de- an object model with higher resolution. In addition, [13] pro-
scribes the task in detail. Section 4 states the network structure posed a novel 3D-RecGAN approach to fill the missing region
of Point Encoder GAN and the joint loss function definition in 3D occupancy grid which only takes the voxel grid represen-
with mathematical derivation. Section 5 gives the experiment tation as input. Although these methods show good results, the
process and the visualized results. Meanwhile, this part also input of their network is voxel data rather than point cloud data.
shows the quantitative evaluation of the results. At last, the Different from these papers, Point Encoder GAN can directly
conclusion and future work will be introduced in Section 6. fill the missing point cloud data.
2. Related Work 2.3. Point Cloud Learning
2.1. Generative Adversarial Network Because of the diversity and specificity of point cloud data,
GAN (Generative Adversarial Network) proposed by Good- it is difficult to use regular deep learning network directly in
fellow [24, 25] consists of two deep networks, a generator G point cloud learning. In order to solve these problems, researchers
and a discriminator D. The generator generates fake samples propose some networks using point cloud directly as input. These
and the discriminator tries to distinguish real samples from over- networks usually have delicate structures.
all data. G and D are trained jointly until discriminator cannot PointNet [39] uses max-pooling and T -Net to obtain global
distinguish whether the generated samples are real or fake. We features of point cloud. PointNet++ [40] can perceive local fea-
can train generator and discriminator which constitutes GAN tures due to its hierarchical structure based on PointNet.
alternately. In other words, GAN can be regarded as a game
2
Figure 3: The architecture of Point Encoder GAN. The specific explanation is shown in Section 4. This network is trained by both reconstruction loss (hybrid) and
adversarial loss.
PointCNN [41] proposes a X-Conv operation for feature acqui- erased point cloud. Thus, the trained network can generate the
sition of point cloud. They respectively achieve the accuracy missing point cloud with the same amount as erased points. We
of 89.2%, 90.7%, and 91.7% for classification task on Model- use ModelNet40 for training and validation of Point Encoder
Net40. Therefore, some structures of these networks are worth GAN.
learning for our references. We call such task as point cloud inpainting throughout the
paper. There are two difficulties: the unique properties of point
cloud and the definition of loss function between two sets of
3. Task Statement
point clouds. Our solution and mathematical derivation is given
Different from 2D images, 3D point cloud has the following in Section 4.
unique features: unordered and rotational invariance.
4. Point Encoder GAN
Unordered In essence, a point cloud is a series of points in
3D space. The overall shape of the point cloud has no concern 4.1. Network Architecture
with the order of points. In other words, different sequences of
Overview As illustrated in Figure 3, the proposed Point En-
points in the input set should result in the same output of the
coder GAN consists of generator network (G-Net) and discrim-
network theoretically.
inator network (D-Net). The whole framework is inspired by
Rotational Invariance This property usually refers to ro- Context Encoders [7]. The encoder of G-Net transforms point
tation invariance. As for the same object, the coordinate of a clouds into a compact feature representation. The decoder of
certain point in a point cloud would vary with rotation. In our G-Net generates the missing point cloud data out of this rep-
method for 3D point cloud, point cloud rotations should not al- resentation. The D-Net is given to help the G-Net predict the
ter classification results. missing points from the latent feature representation. T -Net is
a data-dependent spatial transformer that helps to transform the
In our model, the primary input and output are unordered input data optimally in PointNet [39]. So, we add T -Net to
point cloud sets. A set of 3D point cloud with size n can be both G-Net and D-Net to solve the rotation invariance property
represented as {Pi |i = 1, . . . , n} and Pi is a vector of (xi , yi , zi ) in of point cloud data.
Euclidean Space. We bring the GAN model to promote training of the encoder-
Assume N and M are the numbers of the points in the ini- decoder network (G-Net). The essence of GAN training proce-
tial point cloud and the erased point cloud, respectively. Our dure is a game theory problem. The object is to get a G-Net
goal of proposed Point Encoder GAN is to output the generated which can learn the data contribution from the training sam-
missing point cloud with size M. We initialized the missing ples. The addition of GAN encourages the entire output of
point cloud as zero point cloud with (0, 0, 0) coordinates. In the encoder more realistic. In other words, during the inces-
other words, the initial input of Point Encoder GAN is a de- sant “frauds” between G-Net and D-Net, the output of G seems
fective point cloud with size (N − M) and a zero point cloud more suitable.
(xi , yi , zi = 0|i = 1, . . . , M) with size M. During the training To conclude, Point Encoder GAN enjoys the advantages of
process, the zero point cloud gradually converges towards the PointNet [39] for dealing with point cloud, Context Encoders
3
[7] for auto-encoding, and GANs [24] for discrimination and organized data is easy to define because it belongs to a one-to-
generation, thus delivering satisfactory results. one relationship. For example, the image loss of picture A and
picture B with the same size N×N can be determined by:
T -Net Structure We use a max-pooling layer to solve the
unordered of point cloud, and T -Net to overcome point cloud 1 XX
N N
invariance according to the structure of PointNet [39]. As shown L(A, B) = L(Ai, j , Bi, j ), (3)
N 2 i=1 j=1
in Figure 3, a T -Net combines with serial layers of shared 64-
MLP (Multi-Layer Perception), shared 128-MLP, shared 1024-
where Ai, j and Bi, j is respectively the pixel point location of
MLP, a max-pooling layer, 256-FCL (fully connected layer),
picture A and B.
9-FCL to obtain a 3 × 3 matrix. Its output is the matrix multi-
plication of the input point cloud matrix and this 3 × 3 matrix.
G-Net Structure An encoder-decoder pipeline constitutes

the G-Net. We use a part of PointNet structure [39] as the en-
coder. As one part in Figure 3, the decoder consists of several
layers of fully connected layers and a T -Net.
Suppose the size of the input data is N × 3, and the size of
output data is M × 3. The specific structure has serial layers of a
3 × 3 T -Net, shared 64-MLP, shared 64-MLP, shared 128-MLP,
shared 512-MLP, shared 1024-MLP, a max-pooling layer, 512-
FCL, M-FCL, a M-channel deconvolution, and a 3 × 3 T -Net.
D-Net Structure A point cloud classification network con-

stitutes the D-Net, which is also illustrated in Figure 3. Suppose
Figure 4: Reconstruction loss difference between (a) organized data and (b)
the size of the input data is M × 3 and the size of output data
unordered data. It is improper to use one-to-one loss for unordered data.
is M × 3. The specific structure has serial layers of a 3 × 3 T -
Net, shared 64-MLP, shared 64-MLP, shared 256-MLP, a max-
However, the one-to-one relationship does not hold for two
pooling layer, 128-FCL, 16-FCL, and a Sigmoid-classifier.
point cloud sets, as Figure 4 shows. The loss function L of point
cloud Â and point cloud B̂ must satisfy the interchangeability as
4.2. Loss Function
follows:
In this network, the loss function has two parts: adversar- L(Â, B̂) = L( B̂, Â). (4)
ial loss and reconstruction loss. The former function is defined
by the whole GAN model and the latter one represents the dif- Inspired by [42], we define the hybrid loss function of point
ference between the real point cloud and the generated point cloud Â and B̂ with the same length N. This function is a rota-
cloud. The loss function is also visualized in Figure 3. tional symmetrical expression:
The loss function of our network is determined by: N N
ωÂ:B̂ X ωB̂:Â X
L(Â, B̂) = L(Âi , B̂) + L( B̂i , Â), (5)
L = λadv Ladv + λrec Lrec , (1) N i=1 N i=1
where λadv and λrec is the weight of the adversarial loss and the where ωÂ:B̂ is the weight of point cloud Â to B̂, and ωB̂:Â satis-
reconstruction loss, respectively. They satisfy λadv + λrec = 1. fies ωÂ:B̂ + ωB̂:Â = 1.
Adversarial Loss This loss roots in GAN model [24]. We We then use Chamfer Distance [43] to define the loss be-
regard the G-Net and the D-Net as parametric functions. G : tween one point P and a point cloud Ŝ with length K. It is
X → Y is considered as the mapping function from input sam- worth noting that Chamfer Distance is L2 -Norm value:
ples X to real samples Y, which is the approximation of G0 :
L(P, Ŝ ) = min |P, Ŝ i |2 . (6)
X → Y0 that maps from input samples X to data contribution 1≤i≤K
Y0 . D-Net tries to distinguish the generated data from G-Net
and the authenticated samples. The adversarial loss function Combining with the Eq. 5 and Eq. 6, the loss function
can be defined by: definition is determined by:
X X N
Ladv = ln(D(yi )) + ln(1 − D(G(xi ))), (2) ωÂ:B̂ X
L2 (Â, B̂) = min |Âi , B̂ j |2
1≤i≤S 1≤i≤S N i=1 1≤ j≤N
N
(7)
where xi ∈ X, yi ∈ Y, i = 1, . . . , S . S is the sample size of ω X
+ B̂:Â min |Âi , B̂ j |2 .
X, Y. N j=1 1≤i≤N
Reconstruction Loss Pixel data (image) and voxel data
(3D grid) are both organized data. The loss function for such
4
5. Experimental Validation Earth Mover Ratio For evaluating the aggregation of the
generated points compared with the ground truth, we define an-
5.1. Model Training other ratio based on Earth Mover Distance [44]:
We use PyTorch as our deep learning framework to imple-
ment Point Encoder GAN. Our data set is composed of 12308 EMDG
EMR = 10 × | log10 | (dB), (10)
generated defective point clouds from ModelNet40. To acquire EMDT
training set, we take initial point clouds from ModelNet40 (1024
points) and erase 256 points around a random kernel in each where EMDG and EMDT is the Earth Mover Distance of the
one. Thus, each point cloud in our data set contains 768 points, generated point cloud and the ground truth, respectively. EMR
represented as coordinates (xi , yi , zi ). The data set is split into measures the density difference between generated samples and
two subsets: 9840 samples for training, 2468 samples for test true samples. The optimal value of EMR is 0 dB when EMDG =
and both of the subsets include all the 40 categories in Model- EMDT .
Net40.
5.3. Test Results
5.2. Evaluation Measures In our experiments, we test our model on 2468 samples
Since point cloud inpainting is a frontier in computer vision, within ModelNet40, and also examine it out of ModelNet40.
quantitative indexes for point cloud inpainting are inadequate. The visualized results and the evaluation comparisons are stated,
Some reasonable indexes are established for point cloud evalu- followed with relevant analysis.
ation.
Inpainting Results on ModelNet40 Some of the valida-
Regression Ratio Our first goal is to generate the point tion results on ModelNet40 are shown in Figure 5. The high-
clouds as similar as the original ones. In order to evaluate the lighted parts represent for the erased points of initial point cloud
disparity of generated M missing points from G-Net and erased and the generated missing points of our model. Generally, the
M points quantitatively, Regression Ratio is then raised. The inpainting results of most categories have met our expectation.
mathematical definition is as follows: Comparing with the ground truth point sets, the generated miss-
ˆ , S Real
L(S Gen ˆ ) ing points match with the defective point clouds pretty well,
Rreg = (1 − ) × 100%, (8) visually and perceptually. In the presented two views, the gen-
ˆ , S Real
L(S Zero ˆ )
erated points show sound similarity with the erased point cloud.
ˆ , S Real
where S Gen ˆ , S Zero
ˆ represents the generated, real, and zero
Evaluation Indexes of Different Models After visualiza-
point cloud with the same size M, respectively. Rreg ∈ [0% −
tion, quantitative indexes are calculated to evaluate the inpaint-
100%] indicates the reconstruction degree of the inpainting pro-
ing quality of different models, which substantiates the effec-
cess. Consider these two extreme situations:
tiveness of our model.
ˆ = SReal
Rreg = 100%, if SGen ˆ Rreg represents for the regression ratio of the missing points
ˆ = SZero
ˆ . based on the loss function (Rreg,L1 and Rreg,L2 ). MDR (FDMR
Rreg = 0%, if SGen
and VMDR) represents for the quality of generated point cloud,
We will use both L1 and L2 loss for specific evaluations. Thus, including the completeness and homogeneity. EMR quantifies
in our experiments, we have two evaluation indexes, Regres- the density difference between the generated point cloud and
sion Ratio of L1 -Norm Rreg,L1 and Regression Ratio of L2 -Norm the ground truth. A model with higher Rreg , lower MDR, and
Rreg,L2 . lower EMR is the preference. We calculate the indexes of four
models: 1, 5, 7, 10 epoch(s). All the indexes of the above mod-
Matching Distance Ratio In some cases, the generated els are shown in Table 1. The visualization of these models are
point cloud is not similar to the original one but still makes given in Figure 6.
sense. In such kind of generation verisimilitude, the matching
effect is great although the regression ratio is not high. So, Rreg,L1 Rreg,L2 FMDR VMDR EMR
we define Matching Distance Ratio (MDR). The mathematical Epoch
(%) (%) (dB) (dB) (dB)
definition is given as follows:
1 45.01 36.47 3.218 2.991 1.522
DM 5 61.81 55.79 2.328 1.814 1.005
MDR = 10 × | log10 | (dB), (9) 7 53.82 45.05 2.785 2.563 1.367
DS
10 51.15 43.07 2.996 2.565 1.416
where D M is the mean value of the point distances in inpainting
matching margin, and DS is the point cloud density. If tak- Table 1: Results of Evaluation Indexes on ModelNet40 of Different Epochs.
ing the density value of ground truth as DS , we call this value
Fixed Matching Distance Ratio (FMDR). If taking the density According to Table 1, it is clear that the 5-epoch model
value of generated point cloud as DS , we call this value Vari- achieves the best results in all the indexes, significantly higher
able Matching Distance Ratio (VMDR). The optimal value of than the others. This model get higher Rreg and lower EMR
MDR is 0 dB when D M = DS . which are also confirmed by visulized results in Figure 6. After
5
Figure 5: Examples of inpainting results with two views in ModelNet40. Our network achieves an end-to-end inpainting task without label-based data preprocessing.
detailed shape of original undamaged region. In addition, PCN

takes more times to train. Our model takes up 8MB of mem-
ory space which is smaller than PCN (22.3MB in Pytorch). As
Figure 8 shows, our model needs less training and gets better
visualization performance. It is worth mentioning that our in-
painting model can retain the initial shape of incomplete point
cloud.
Generation Capability Figure 9 shows another experi-

mental phenomena. As for the point cloud of this plant, the
erased points mainly are distributed around the stalk on the
Figure 6: Inpainting results for a cone and a bottle of 1, 3, 5, 7, 10 epoch(s). right. Since some points around the middle stalk are also erased,
The 5-epoch model has the best performance.
the generated points are mainly distributed around the middle
stalk, which differs from the ground truth but makes sense.
5-epoch, the model will be more and more overfitting. It results During the experiments, point clouds of this type are noticed
in that inpainting point cloud shape becomes divergent. in a notable number. Although the generated points are differ-
ent with the ground truth, we still consider the inpainting task is
Influence of T-Net for the Whole Network Figure 7 re- successful. Thus, as shown in the example, if some key features
flects the capability of T -Net to stabilize and smooth the output. are removed, Point Encoder GAN may generate point cloud in
T -Net structure is able to extract more features from training a different but reasonable way, which enables the network to
data and adjust the rotation angles. Compared with the model produce new point clouds.
with T -Net, the model without T -Net has lower structural com-
plexity to extract obvious features. Generalization Performance We also test the trained net-
work on other datasets out of ModelNet40. Figure 10 depicts
Comparison with Other Methods We compared Point the inpainting results of Stanford bunny and horse. The back
Encoder GAN with PCN(Point Completion Network) [42] which of bunny and the belly of horse are filled with generated points
is the one of best networks in point cloud completion task field. in an analogous way. But the foreleg of the horse remains un-
It combines the advantages of fully-connected network and Fold- completed which reflects the weakness of local feature learning.
ingNet [45] to generate the whole complete point cloud from Generally speaking, the cross-dataset experimental results seem
partial point cloud. However, this architecture may change the acceptable, delivering a promising capability of generalization.
6
Figure 8: The visualization comparison with PCN. Our model has higher ro-
bustness and smoothness of generation.
Figure 7: Inpainting results for the cone and the bottle of different network
structures. The network with T -Net performs better than that without T -Net.
The authors declare the following financial interests/personal
relationships which may be considered as potential competing
6. Conclusion
interests:
In this paper, we propose a novel network structure named
Point Encoder GAN for end-to-end point cloud inpainting, which References
takes point cloud data as the input directly without label-based
data preprocessing. Using ModelNet40 for training, the Point [1] F. Bosche, C. T. Haas, Automated retrieval of 3D CAD model objects
in construction range images, Automation in Construction 17 (4) (2008)
Encoder GAN shows a great performance. Supported with ex- 499–512.
perimental results, the network shows the capability of point [2] R. J. Urbanic, H. A. Elmaraghy, W. Elmaraghy, A reverse engineering
cloud inpainting. The superiority is that the network trains fast methodology for rotary components from point cloud data., The Inter-
but enjoys great generalization capability, which is a potential national Journal of Advanced Manufacturing Technology 37 (11) (2008)
1146–1167.
solution for enlarging 3D point cloud data sets. Another po- [3] T. Santos, N. Gomes, S. Freire, M. C. Brito, L. Santos, J. A. Tenedorio,
tential application is transforming 2.5D point cloud to 3D point Applications of solar mapping in the urban environment, Applied Geog-
cloud fleetly. This application may lower the cost of 3D point raphy 51 (2014) 48–57.
cloud acquisition. However, the local feature learning ability of [4] R. B. Gomes, B. Silva, L. Rocha, R. V. Aroca, L. Velho, L. M. G.
Goncalves, Efficient 3D object recognition using foveated point clouds,
the Point Encoder GAN needs to be enhanced. We aim to build Computers and Graphics 37 (5) (2013) 496–508.
a hierarchical network structure for deeper feature representa- [5] W. Liu, S. Li, D. Cao, S. Su, R. Ji, Detection based object labeling of 3d
tion in the future. point cloud for indoor scenes, Neurocomputing 174 (174) (2016) 1101–
1106.
[6] X. Le, S. Chen, Z. Yan, J. Xi, A neurodynamic approach to distributed op-
Declaration of Competing Interest timization with globally coupled constraints, IEEE Transactions on Cy-
bernetics 48 (11) (2018) 3149–3158.
[7] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, A. A. Efros, Context
All authors have participated in (a) conception and design, encoders: Feature learning by inpainting, in: The IEEE Conference on
or analysis and interpretation of the data; (b) drafting the article Computer Vision and Pattern Recognition, 2016.
or revising it critically for important intellectual content; and [8] C. Yang, X. Lu, Z. Lin, E. Shechtman, O. Wang, H. Li, High-resolution
(c) approval of the final version. image inpainting using multi-scale neural patch synthesis, Computer Vi-
sion and Pattern Recognition (2017) 4076–4084.
This manuscript has not been submitted to, nor is under re- [9] R. Gao, K. Grauman, On-demand learning for deep image restoration,
view at, another journal or other publishing venue. International Conference of Computer Vision.
The authors have no affiliation with any organization with [10] S. Iizuka, E. Simo-Serra, H. Ishikawa, Globally and locally consistent
a direct or indirect financial interest in the subject matter dis- image completion, ACM Transactions on Graphics 36 (4) (2017) 107.
[11] G. Liu, F. A. Reda, K. J. Shih, T.-C. Wang, A. Tao, B. Catanzaro, Image
cussed in the manuscript inpainting for irregular holes using partial convolutions, in: Proceedings
The authors declare that they have no known competing fi- of the European Conference on Computer Vision, 2018, pp. 85–100.
nancial interests or personal relationships that could have ap- [12] J. Varley, C. DeChant, A. Richardson, J. Ruales, P. Allen, Shape comple-
peared to influence the work reported in this paper. tion enabled robotic grasping, in: International Conference on Intelligent
Robots and Systems, IEEE, 2017, pp. 2442–2447.
7
Figure 10: Inpainting results for a bunny and a horse out of ModelNet40. It
performs well despite some problems.
[26] M. Fridadar, I. Diamant, E. Klang, M. Amitai, J. Goldberger,

H. Greenspan, Gan-based synthetic medical image augmentation for in-
creased cnn performance in liver lesion classification, Neurocomputing
321 (2018) 321–331.
[27] J. Li, W. Monroe, T. Shi, S. Jean, A. Ritter, D. Jurafsky, Adversarial
learning for neural dialogue generation, Empirical Methods in Natural
Language Processing (2017) 2157–2169.
[28] S. Subramanian, S. Rajeswar, F. Dutil, C. Pal, A. C. Courville, Adversar-
ial generation of natural language., Meeting of the Association for Com-
Figure 9: Inpainting results for a plant and a bottle. Although the generated putational Linguistics (2017) 241–251.
point cloud is not exactly similar with the ground truth, our inpainting action is [29] L. Yu, W. Zhang, J. Wang, Y. Yu, SeqGAN: Sequence generative adver-
also reasonable (seems like another authentic sample). sarial nets with policy gradient, National Conference on Artificial Intelli-
gence (2017) 2852–2858.
[30] Y. Zhang, Z. Gan, K. Fan, Z. Chen, R. Henao, D. Shen, L. Carin, Adver-
sarial feature matching for text generation., International Conference on
[13] B. Yang, H. Wen, S. Wang, R. Clark, A. Markham, N. Trigoni, 3D object Machine Learning (2017) 4006–4015.
reconstruction from a single depth view with adversarial learning, in: The [31] P. Isola, J. Zhu, T. Zhou, A. A. Efros, Image-to-image translation with
IEEE International Conference on Computer Vision Workshops, 2017. conditional adversarial networks, Computer Vision and Pattern Recogni-
[14] W. Wang, Q. Huang, S. You, C. Yang, U. Neumann, Shape inpainting tion (2017) 5967–5976.
using 3D generative adversarial network and recurrent convolutional net- [32] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta,
works, in: The IEEE International Conference on Computer Vision, 2017. A. P. Aitken, A. Tejani, J. Totz, Z. Wang, et al., Photo-realistic single
[15] X. Chen, Y. Chen, K. Gupta, J. Zhou, H. Najjaran, Slicenet: A profi- image super-resolution using a generative adversarial network, Computer
cient model for real-time 3d shape-based recognition, Neurocomputing Vision and Pattern Recognition (2017) 105–114.
316 (2018) 144 – 155. [33] J. Zhu, T. Park, P. Isola, A. A. Efros, Unpaired image-to-image translation
[16] Z. Liu, G. Song, J. Cai, T.-J. Cham, J. Zhang, Conditional adversarial using cycle-consistent adversarial networks, International Conference on
synthesis of 3d facial action units, Neurocomputing 355 (2019) 200 – Computer Vision (2017) 2242–2251.
208. [34] G. Yang, S. Yu, H. Dong, G. Slabaugh, P. L. Dragotti, X. Ye, F. Liu, S. Ar-
[17] D. Maturana, S. Scherer, Voxnet: A 3D convolutional neural network for ridge, J. Keegan, Y. Guo, et al., DAGAN: Deep de-aliasing generative ad-
real-time object recognition, in: International Conference on Intelligent versarial networks for fast compressed sensing MRI reconstruction, IEEE
Robots and Systems, 2015. Transactions on Medical Imaging 37 (6) (2017) 1310–1321.
[18] A. Brock, T. Lim, J. M. Ritchie, N. Weston, Generative and discriminative [35] H. Zenati, C. Foo, B. Lecouat, G. Manek, V. Chandrasekhar, Efficient
voxel modeling with convolutional neural networks, Computer Science. GAN-Based anomaly detection, arXiv: Learning.
[19] M. Nießner, M. Zollhöfer, S. Izadi, M. Stamminger, Real-time 3D recon- [36] P. Achlioptas, O. Diamanti, I. Mitliagkas, L. J. Guibas, Learning repre-
struction at scale using voxel hashing, ACM Transactions on Graphics sentations and generative models for 3D point clouds, International Con-
32 (6) (2013) 169. ference on Learning Representations.
[20] Y. Li, S. Pirk, H. Su, C. R. Qi, L. J. Guibas, Fpnn: Field probing neural [37] D. Valsesia, G. Fracastoro, E. Magli, Learning localized generative mod-
networks for 3D data, in: Advances in Neural Information Processing els for 3D point clouds via graph convolution, International Conference
Systems, 2016, pp. 307–315. on Learning Representations.
[21] D. Z. Wang, I. Posner, Voting for voting in online point cloud object de- [38] P. M. Chu, Y. Sung, K. Cho, Generative adversarial network-based
tection, in: Robotics: Science and Systems XI, Sapienza University of method for transforming single rgb image into 3D point cloud, IEEE Ac-
Rome, Rome, Italy, 2015. cess 7 (2018) 1021–1029.
[22] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, J. Xiao, 3D [39] C. R. Qi, H. Su, K. Mo, L. J. Guibas, Pointnet: Deep learning on point
ShapeNets: A deep representation for volumetric shapes, in: The IEEE sets for 3D classification and segmentation, in: The IEEE Conference on
Conference on Computer Vision and Pattern Recognition, 2015. Computer Vision and Pattern Recognition, 2017.
[23] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, [40] C. R. Qi, L. Yi, H. Su, L. J. Guibas, Pointnet++: Deep hierarchical fea-
S. Savarese, M. Savva, S. Song, H. Su, et al., Shapenet: An information- ture learning on point sets in a metric space, in: I. Guyon, U. V. Luxburg,
rich 3D model repository, arXiv preprint arXiv:1512.03012. S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.),
[24] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, X. Bing, D. Warde-Farley, Advances in Neural Information Processing Systems 30, Curran Asso-
S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in: Inter- ciates, Inc., 2017, pp. 5099–5108.
national Conference on Neural Information Processing Systems, 2014. [41] Y. Li, R. Bu, M. Sun, W. Wu, X. Di, B. Chen, PointCNN: Convolution
[25] M. Arjovsky, S. Chintala, L. Bottou, Wasserstein GAN, arXiv Preprint on X-Transformed points, in: Advances in Neural Information Processing
arXiv:1701.07875. Systems 31, Curran Associates, Inc., 2018, pp. 820–830.
8
[42] W. Yuan, T. Khot, D. Held, C. Mertz, M. Hebert, PCN: Point completion
network, International Conference on 3D Vision.
[43] G. Borgefors, Hierarchical chamfer matching: A parametric edge match-
ing algorithm, IEEE Transactions on Pattern Analysis and Machine Intel-
ligence 10 (6) (1988) 849–865.
[44] Y. Rubner, C. Tomasi, L. J. Guibas, The earth mover’s distance as a met-
ric for image retrieval, International Journal of Computer Vision 40 (2)
(2000) 99–121.
[45] Y. Yang, C. Feng, Y. Shen, D. Tian, FoldingNet: Point cloud auto-encoder
via deep grid deformation (2018) 206–215.
9
Yikuan Yu- writing, review editing, formal analysis
Zitian Huang- visualization, validation
Fei Li- resources, funding acquisition, investigation
Haodong Zhang- paper revision
Xinyi Le -Conceptualization, projection administration, supervision.
1
Authors’ Bios
Yikuan Yu, Zitian Huang, Fei Li, Haodong Zhang, Xinyi Le
Yikuan Yu received the B.E. and B.Ec degree from

Shanghai Jiao Tong University, Shanghai, China,
in 2017. He is currently pursuing the master’s de-
gree from the School of Mechanical Engineering,
Shanghai Jiao Tong University, Shanghai, China. His
current research interests include computer vision
and deep learning.
Zitian Huang received the B.E. degree in me-

chanical engineering from South China University
of Technology, Guangzhou, China, in 2018. He is
currently pursuing the master’s degree from the
School of Mechanical Engineering, Shanghai Jiao
Tong University, Shanghai, China. His current re-
search interests include computer vision and deep
learning.
Fei Li received the B.E. and Ph.D. degrees in

mechanical and electronic engineering from Zhe-
jiang University, Hangzhou, China, in 2011. He is
a Staff Researcher with the State Key Laboratory
of Intelligent Manufacturing System Technology,
Beijing Institute of Electronic System Engineering,
Beijing, China. His current research interests include
intelligent manufacturing, Internet of Things, and big
data science and application.
Haodong Zhang received the B.E. degree in me-

chanical engineering from Shanghai Jiao Tong Uni-
versity, Shanghai, China, in 2018. He is currently
pursuing the master’s degree from the School of Me-
chanical Engineering, Shanghai Jiao Tong Univer-
sity, Shanghai, China. His current research interests
include intellengent manufacturing, defect detection
and neural networks.
Xinyi Le (S’13, M’17) received the B.E. and B.S.

degrees from Tsinghua University, Beijing, China,
in 2012, and the Ph.D. degree from the Chinese
University of Hong Kong, Hong Kong, in 2016. She
is a Lecturer with the School of Mechanical Engi-
neering, Shanghai Jiao Tong University, Shanghai,
China. Her current research interests include neural
networks, distributed optimization, robust control,
and intelligent manufacturing. Dr. Le was a recipient
of Shanghai Pujiang Talent Plan.

Journal Pre-Proof: Neurocomputing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Journal Pre-Proof: Neurocomputing

Uploaded by

Copyright:

Available Formats

Point Encoder GAN: A Deep Learning Model for 3D Point Cloud Inpainting Communicated by Dr. H.

Point Encoder GAN: A Deep Learning Model for 3D Point Cloud

Yikuan Yu, Zitian Huang, Fei Li, Haodong Zhang, Xinyi Le

To appear in: Neurocomputing

Received date: 30 June 2019

© 2019 Elsevier B.V. All rights reserved.

Email addresses: yuyikuansjtu@163.com (Yikuan Yu),

Preprint submitted to Elsevier December 18, 2019

2.2. Inpainting Model

Image Inpainting In recent years, deep learning methods

2. Related Work 2.3. Point Cloud Learning

G-Net Structure An encoder-decoder pipeline constitutes

D-Net Structure A point cloud classification network con-

detailed shape of original undamaged region. In addition, PCN

Generation Capability Figure 9 shows another experi-

[26] M. Fridadar, I. Diamant, E. Klang, M. Amitai, J. Goldberger,

Yikuan Yu received the B.E. and B.Ec degree from

Zitian Huang received the B.E. degree in me-

Fei Li received the B.E. and Ph.D. degrees in

Haodong Zhang received the B.E. degree in me-

Xinyi Le (S’13, M’17) received the B.E. and B.S.

You might also like