Professional Documents
Culture Documents
Systematically Designing Better Instance Counting Models On Cell Images With Neural Arithmetic Logic Units 2020
Systematically Designing Better Instance Counting Models On Cell Images With Neural Arithmetic Logic Units 2020
Abstract—The big problem for neural network models which computation and performs relatively better than non linear
arXiv:2004.06674v2 [cs.LG] 15 Jun 2020
are trained to count instances is that whenever testing range data activation functions for arithmetic operations. This numerical
goes to higher counts than training range data generalization bias of learning computations makes them excellent choice for
error increases i.e. prediction error for images that are outside
training range increases. Consider the case of automating cell counting tasks which are essentially is an increment addition
counting process where more dense images with higher cell operation only. Deep learning models generally take either
counts are commonly encountered as compared to images used in segmentation approach with explicit counting trainer or end-
training data. By making better predictions for higher ranges of to-end counting via a regression loss. In this paper we will
cell count we are aiming to create better generalization systems go through the latter approach [4] in detail for automation of
for cell counting. With architecture proposal of neural arithmetic
logic units (NALU) for arithmetic operations, task of counting cell counting process. As cell counting is cumbersome task
has become feasible for higher numeric ranges which were not and dense cell images with higher cell counts containing data
included in training data with better accuracy. In our study we outside training numeric range are common in real world sce-
incorporate these units in already existing Fully Convolutional narios. Achieving true cell automation with less generalization
Regression Network (FCRN) and U-Net architectures in the form errors is the prime objective of this paper.
of residual concatenated layers. We carried out a systematic
comparative study with the newly proposed changes and earlier
base architectures. This comparative study results are evaluated
in terms of optimizing regression loss across cell density map
over an image. We achieved better results in cell counting
tasks with our newly proposed architectures having residual
layer concatenation connections. We further validated our results
on custom created high count dataset created from BBBC005
synthetic cell count dataset and obtained even better results
where testing images have higher counts. These results confirm
that above mentioned numerically biased units does help models
to learn numeric quantities for better generalization results on
high count data.
Index Terms—Neural Arithmetic Logic Units, Cell Counting,
Fully Convolutional Regression Networks, U-net.
I. I NTRODUCTION
Ability to generalize concepts is fundamental component
of intelligence and core for designing smart systems [1],
[2]. Neural networks simulates this behavior with hierarchical
learning of concepts. When it comes to automation, counting is
an important task from machine vision application [3] to cell
counting [4]. While neural networks manipulates numerical Fig. 1. Abstract representation of proposed modified architecture (Right) with
quantities but it is not associated with systematic generaliza- either NAC or NALU as compared to previous Fully Convolution Regression
tion [5], [6]. These networks fail to generalize as evident from Networks (Left). Here, as clearly specified with dimensional analysis of tensor
blocks (Right) the layers in the newly proposed model are concatenated and
high generalization error while predicting quantities that lie again squeezed into same size as earlier base model architecture.
outside the training numerical range [7]. This highlights mem-
orization behavior in neural networks instead of generalization In regression loss approach Fully convolutional regression
abilities for a given task. This is especially problematic for cell networks [2] and U-net [8] architectures learns mapping be-
counting tasks as images with higher cell counts that were not tween an image I(x) and a density map D(x), given by F : I(x)
part of training are common to encounter in real applications. → D(x) (I ∈ Rm × n , D ∈ Rm × n ) for a m × n pixel image.
Neural accumulators (NAC) and neural arithmetic logic Later on, variations of these base architectures with different
units (NALU) [7] are biased to learn systematic numerical activation functions are implemented in this paper and their
prediction performance is compared with newly proposed on unseen parts of solution space which highlights underlying
NAC/NALU concatenated architectures of FCRN and U-net. structure of behavior governing-equations [22].
The concatenated connections adds numerical bias to layers We introduce architectural changes in models that learns and
of the network and behaves in similar manner like ResNets preserves input information which is numerically biased with
[9]. But, instead of addition of inputs from previous layer the reference to input layers. It is somewhat similar to ResNets [9],
input from previous is passed through these numerical units which are easier to optimize and gain accuracy with increasing
for capturing numerical bias and concatenated with the main depth. With our experiments we aim to highlight that models
network layer as shown in figure 1. On surface it appears with numerically biased concatenated residual functions helps
that these proposed architectural changes leads to accuracy in achieving better results with their addition in the form of
improvements due to increased model capacity with these a comparative study with original architectures. Also, with
numerically biased units adding more parameters for learning. our results in this paper we demonstrate with our results that
But, our results with custom high count dataset for testing backpropagation learns this numerical bias without any explicit
created from BBBC005 [10] reflects increased generaliza- numeric quantity being provided as input implying that better
tion counting abilities. As for higher cell count images our computer vision counters can be trained with this module when
model gives higher relative improvement in mean absolute added to existing convolutional neural network architectures.
error(MAE). Density based estimation doesnt require prior object detec-
Our concatenation based residual architecture utilizes the tion or segmentation [12], [15], [23]. In previous years, several
fundamentals of batch normalization like specified identity works have investigated this approach. In [15], the problem
mapping architecture [11] in ResNets. But, instead of using is stated as density estimation with a supervised learning
convolution operation directly this network leverages numeri- algorithm, D(x) = cT φ(x), where D(x) represents ground-truth
cal bias information obtained from NAC and NALU operations density map, and φ(x) represents local features and parameters
applied on input layer and then finally uses convolution c are learned by minimizing the error between predicted and
operation on the concatenated layer. Before and after this true density with quadratic programming over all possible
concatenation of this numerical bias learning operation, batch sub-windows. In [23], regression forest is used to exploit
normalization is carried out and output of this operation is patch-based idea for learning structured labels, then for new
added back again to our next main network layer, as shown input image density map is estimated averaged over structured
in figure 1. patch-based predictions. Also, in [12] an algorithm is used that
With means of this paper we introduce changes in current allows fast interactive counting with ridge regression.
regression based model architectures for end-to-end counter
training and produce systems with improved accuracy. Also,
we validate our trained models on a different specially tai-
lored validation dataset with approximately seven times higher
counts of cells as compared to training dataset created from
BBBC005 synthetic cell dataset [10]. Overall, experimental
results demonstrates that supplementing the base architectures
with NAC and NALU helped in achieving better results and
improved relative MAE for higher count test images.
Fig. 6. Left: Original image from BBBC005 dataset. Right: Repeated image
generated from a 4X4 grid repetition operation with random horizontal and
vertical flips of base image shown in left.
TABLE II
R ESULT SUMMARIZATION FOR TRAINED MODELS
TABLE III
VALIDATING TRAINED MODELS FOR EXTRAPOLATION CELL COUNTING
TASKS
Fig. 9. Network Architecture Top: U-net with 3x3 convolution operations
and increasing depth of filters from 64 to 256 for feature abstraction & FCRN-Models MAE U-Net-Models MAE
learning and after that these layers are fed to upsampling layers. Network ReLU 3.04 ReLU 2.87
Architecture Bottom: U-nets main network added with residual concatenation
LeakyReLU 2.99 LeakyReLU 2.62
connections of Ni numerically biased units and after that batch normalized
for regularization. Similar, operation abbreviations used as stated in figure 8 Linear 2.85 Linear 2.47
NALU-tanh 2.32 NALU-tanh 1.95
NALU 2.27 NALU 1.87
NAC 2.40 NAC 1.92
Relative improvement in predictions is visualized in figure This figure 10 shows more increase in relative improvement
10 against ReLU based activation base result for comparison as we move right towards horizontal axis for both testing
with other activation layer changes in FCRNs/U-nets and and validation task with extrapolation where in validation
concatenation layer NALU/NAC residual connection addition extrapolation task NAC/NALU models from which we can
in FCRNs & U-nets. It includes averaged out comparison conclude that trained models are having better generalization
from multiple executions of training and testing runs for abilities with some learned numerical bias in their trained
both interpolation testing and extrapolation validation counting weights with which even better predictions for higher count
tasks for FCRN and U-net variant models with respect to cells is made.
ReLU based FCRN and U-net model. From, this figure it
is clearly highlighted that models with NAC and NALUs V. S UMMARY
residual modules have better generalization capabilities for We were able to demonstrate that addition of newly pro-
extrapolation counting tasks i.e. they are better generalizers for posed NAC and NALU units in existing architectures in
this given cell counting task with increase in relative improve- the form of residual concatenation connection layer mod-
ment in prediction as compared to base ReLU implementation. ules achieves better results. With numerically biased residual
Here, for measuring generalization capabilities of model in connections, higher accuracy for more dense images having
extrapolation task relative improvement metric is selected as higher counts of cells is achieved. Hence, producing more
it bring more perspective to the improved performance on generalized cell counters that provides better predictions for
validation dataset. The performance has shown a general incre- real life use-cases. Finally, for code implementation details and
ment for every model as validation dataset consist of highly other supplementary experimental results refer to this paper’s
focused images. Hence, measuring relative improvement in github repository.
results is more appropriate decision to get understanding of
true improvment in model prediction capabilities. R EFERENCES
[1] Dehaene, Stanislas. The number sense: How the mind creates mathe-
matics. OUP USA, 2011.
[2] Gallistel, C. Randy. ”Finding numbers in the brain.” Philosophical
Transactions of the Royal Society B: Biological Sciences 373.1740
(2018): 20170119.
[3] Baygin, Mehmet, et al. ”An Image Processing based Object Count-
ing Approach for Machine Vision Application.” arXiv preprint
arXiv:1802.05911 (2018).
[4] Xie, Weidi, J. Alison Noble, and Andrew Zisserman. ”Microscopy cell
counting and detection with fully convolutional regression networks.”
Computer methods in biomechanics and biomedical engineering: Imag-
ing & Visualization 6.3 (2018): 283-292.
[5] Fodor, Jerry A., and Zenon W. Pylyshyn. ”Connectionism and cognitive
architecture: A critical analysis.” Cognition 28.1-2 (1988): 3-71.
[6] Marcus, Gary F. The algebraic mind: Integrating connectionism and
cognitive science. MIT press, 2018.
[7] Trask, Andrew, et al. ”Neural arithmetic logic units.” Advances in Neural
Information Processing Systems. 2018.
[8] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. ”U-net: Con-
volutional networks for biomedical image segmentation.” International
Conference on Medical image computing and computer-assisted inter-
vention. Springer, Cham, 2015.
[9] He, Kaiming, et al. ”Deep residual learning for image recognition.”
Proceedings of the IEEE conference on computer vision and pattern
recognition. 2016.
[10] Ljosa, Vebjorn, Katherine L. Sokolnicki, and Anne E. Carpenter. ”An-
notated high-throughput microscopy image sets for validation.” Nature
methods 9.7 (2012): 637-637.
[11] He, Kaiming, et al. ”Identity mappings in deep residual networks.”
European conference on computer vision. Springer, Cham, 2016.
[12] Arteta, Carlos, et al. ”Interactive object counting.” European conference
on computer vision. Springer, Cham, 2014.
[13] Chan, Antoni B., Zhang-Sheng John Liang, and Nuno Vasconcelos.
”Privacy preserving crowd monitoring: Counting people without people
models or tracking.” 2008 IEEE Conference on Computer Vision and
Pattern Recognition. IEEE, 2008.
[14] Segu, Santi, Oriol Pujol, and Jordi Vitria. ”Learning to count with deep
object features.” Proceedings of the IEEE Conference on Computer
Fig. 10. Moving towards right in above figure and measuring RIP metric there Vision and Pattern Recognition Workshops. 2015.
is sharp increase with NALU & NAC units based models demonstrating even [15] Lempitsky, Victor, and Andrew Zisserman. ”Learning to count objects
more increase in relative improvement with respect to baseline ReLU models in images.” Advances in neural information processing systems. 2010.
for extrapolation tasks. For interpolation task there is highest 9% improvement [16] Zhang, Cong, et al. ”Cross-scene crowd counting via deep convolutional
and for extrapolation there is 23% improvement with NAC units as residual neural networks.” Proceedings of the IEEE conference on computer
connection. vision and pattern recognition. 2015.
[17] Hernndez, Carlos X., Mohammad M. Sultan, and Vijay S. Pande. ”Using
Deep Learning for Segmentation and Counting within Microscopy
Data.” arXiv preprint arXiv:1802.10548 (2018).
[18] Paul Cohen, Joseph, et al. ”Count-ception: Counting by fully convo-
lutional redundant counting.” Proceedings of the IEEE International
Conference on Computer Vision. 2017.
[19] He, Kaiming, et al. ”Identity mappings in deep residual networks.”
European conference on computer vision. Springer, Cham, 2016.
[20] Srivastava, Rupesh Kumar, Klaus Greff, and Jrgen Schmidhuber. ”High-
way networks.” arXiv preprint arXiv:1505.00387 (2015).
[21] Huang, Gao, et al. ”Densely connected convolutional networks.” Pro-
ceedings of the IEEE conference on computer vision and pattern
recognition. 2017.
[22] Brunton, Steven L., Joshua L. Proctor, and J. Nathan Kutz. ”Discovering
governing equations from data by sparse identification of nonlinear
dynamical systems.” Proceedings of the National Academy of Sciences
113.15 (2016): 3932-3937.
[23] Fiaschi, Luca, et al. ”Learning to count with regression forest and
structured labels.” Proceedings of the 21st International Conference on
Pattern Recognition (ICPR2012). IEEE, 2012.
[24] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. ”Imagenet
classification with deep convolutional neural networks.” Advances in
neural information processing systems. 2012.
[25] LeCun, Yann, et al. ”Gradient-based learning applied to document
recognition.” Proceedings of the IEEE 86.11 (1998): 2278-2324.
[26] Cirean, Dan C., et al. ”Mitosis detection in breast cancer histology
images with deep neural networks.” International Conference on Medical
Image Computing and Computer-assisted Intervention. Springer, Berlin,
Heidelberg, 2013.
[27] Ciresan, Dan, et al. ”Deep neural networks segment neuronal mem-
branes in electron microscopy images.” Advances in neural information
processing systems. 2012.
[28] Ning, Feng, et al. ”Toward automatic phenotyping of developing em-
bryos from videos.” IEEE Transactions on Image Processing 14 (2005):
1360-1371.
[29] Lehmussola, Antti, et al. ”Computational framework for simulating flu-
orescence microscope images with cell populations.” IEEE transactions
on medical imaging 26.7 (2007): 1010-1016.
[30] Vedaldi, Andrea, and Karel Lenc. ”Matconvnet: Convolutional neural
networks for matlab.” Proceedings of the 23rd ACM international
conference on Multimedia. ACM, 2015.