Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

An ASABE Meeting Presentation

DOI: https://doi.org/10.13031/aim.201900724
Paper Number: 1900724

Estimating shrimp body length using deep convolutional neural


network

Hong-Yang Lina, Hsin-Chen Leea, Woei-Ling Ngb, Jyh-Nain Paic, Yuan-Nan Chua, Chyng-Hwa
Lioub, Kuo-Chi Liaoa, Yan-Fu Kuoa
a
Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei, Taiwan.
b
Department of Aquaculture, National Taiwan Ocean University, Keelung, Taiwan.
c
Aquaculture Research Center, Fisheries Research Institute, Council of Agriculture, The Executive Yuan,
Changhua, Taiwan.
Written for presentation at the
2019 ASABE Annual International Meeting
Sponsored by ASABE
Boston, Massachusetts
July 7–10, 2019

ABSTRACT. The body length of shrimp is the major indicator for feeding management in shrimp aquaculture. Food
intake for shrimp is linearly correlated with its body length. Conventionally, the body length of shrimp was measured
using naked-eye inspection and relied on the experience of shrimp farmers. However, manual observation might be
subjective. Imprecise measurement can lead to mistaken feeding strategies and, hence, economic losses in shrimp
aquaculture. This study proposed an automatic method to measure the body length of shrimp in-vivo using underwater
camera and deep learning. In the approach, underwater cameras with infrared light source were designed and established
to observe shrimp activities. A convolutional neural network model was developed to locate shrimps in the images. Next,
image processing algorithms were applied to segment the shrimps from the background and estimate the length of the
shrimp. The results achieved a mAP of 85.08% in shrimp detection and localization, and an RMSE of 5.76% in shrimp
body length estimation.
Keywords. Convolutional neural network, deep learning, image processing, shrimp body length, underwater camera.

Introduction
Shrimp is one major protein source in diet worldwide. According to Global Aquaculture Alliance, the global shrimp
aquaculture production in 2018 reached to about 4.5 million MT (Anderson et al., 2016). Among all the varieties of shrimps,
white shrimp (Litopenaeus vannamei) accounts for a large proportion of production because its eurysalinity, high tolerance
to low level of dissolved oxygen in water, and fast growth. In Taiwan, the production of white shrimp reached 10 thousand
MT in 2015, and the export value was 1.9 billion TWD (Chen et al., 2017). One major cost for shrimp aquaculture is the
feed. Shrimps have to be fed based on the body lengths and appetite to optimize their growth. Conventionally, the body
lengths of shrimps were estimated using naked-eye inspection. However, manual observation might be subjective. Imprecise

The authors are solely responsible for the content of this meeting presentation. The presentation does not necessarily reflect the official position of the
American Society of Agricultural and Biological Engineers (ASABE), and its printing and distribution does not constitute an endorsement of views
which may be expressed. Meeting presentations are not subject to the formal peer review process by ASABE editorial committees; therefore, they are
not to be presented as refereed publications. Publish your paper in our journal after successfully completing the peer review process. See
www.asabe.org/JournalSubmission for details. Citation of this work should state that it is from an ASABE meeting paper. EXAMPLE: Author’s Last
Name, Initials. 2019. Title of presentation. ASABE Paper No. ---. St. Joseph, MI.: ASABE. For information about securing permission to reprint or
reproduce a meeting presentation, please contact ASABE at www.asabe.org/permissions (2950 Niles Road, St. Joseph, MI 49085-9659 USA).1

ASABE 2019 Annual International Meeting Page 2


measurement can lead to suboptimal feeding strategies, feed waste, water quality deterioration, causing economic losses in
shrimp aquaculture. Therefore, an alternative method is necessary.

Figure 1. Acquisition modes of camera: (a) RGB, (b) black-hot, and (c) white-hot.

Some studies have measured the sizes of objects using computer vision. White et al. (2006) measured the length of fish
and determined the orientation of fish using the silhouette of the fish, line-drawing processing, and moment-invariant
method. Wang et al. (2017) detected fruits and estimated fruit lineal dimensions using histogram of oriented gradients, Otsu’s
method, depth information from RGB-D cameras, and thin lens formula. Si et al. (2017) localized multiple potato tubers in
an image using watershed segmentation and estimated the length to width ratios of the potatoes using region labeling and
manual selection. Zhou et al. (2017) enhanced the contrast of the images in recirculating aquaculture system using multi-
scale retinex (MSR; Rahman et al., 1996) algorithm and incomplete Beta function. Although achieved good performances,
the aforementioned approaches required images acquired under controlled conditions.
Recently, convolutional neural network (CNN)-based approaches have been introduced to tackle images acquired under
uncontrolled conditions in the field of agriculture. Li et al. (2015) detected and recognized fish species from underwater
images using faster region-based CNN (Faster R-CNN; Ren et al., 2015). Christiansen et al. (2016) detected obstacles and
anomalies in an agricultural field using an algorithm combining Faster R-CNN and anomaly detection. Liang et al. (2018)
detected on-tree mangoes in outdoor orchards using single shot multi-box detector (Liu et al., 2015). Zhao and Qu (2019)
detected healthy and diseased tomatoes using You Only Look Once version 2 (YOLOv2; Redmon & Farhadi, 2016).
This study proposed to measure the body lengths of shrimp in-vivo using underwater camera and YOLO. The specific
objectives were to (1) acquire shrimp videos using an underwater camera, (2) detect the shrimps in the videos using
YOLO, and (3) estimate the body lengths of the shrimps using image processing (Fig. 2).

Figure 2. Procedure of shrimp body length measurement.

Materials and methods


Shrimp Image Collection
The shrimps for this research were raising at a density of 300/m2 in 15m×10m concrete-walled outdoor ponds at the
Freshwater Research Centre of the Fisheries Research Institute (FRI, Council of Agriculture, Taiwan). The shrimps were fed
twice a day based on their body weight. Daily rations were modified using the feeding images to estimate the appetite of the
shrimps. Water quality parameters including temperature, dissolved oxygen, pH, NH3-N and NO2-N were monitored and
water changes were made per need.
Videos of the shrimps were acquired using an underwater camera with an 850 nm infrared LED light source. The
dimensions of videos were 1920 × 1280 pixels, and the video lengths ranged between 1 min 45 s and 26 min 12 s. The
camera was operated in three acquisition modes: color (Fig. 1a), black-hot grayscale (Fig. 1b), and white-hot grayscale (Fig.
1c). The switch between the color and grayscale modes was determined by the researchers at the FRI. The two grayscale
modes were switched automatically by the camera according to the illumination condition. A total of 15600 images were
generated from the videos using a rate of 0.2 frame per second. Four hundreds and one hundred images were randomly
selected for training and test, respectively. The shrimps in the videos were manually labelled using LabelImg toolkit
(Tzutalin, 2015) into 2 categories: measurable (Fig. 3a) and visible (Figs. 3b and 3c). The measurable shrimps were defined
as the shrimps that were complete and clear for observers, whereas the visible shrimps were incomplete or blurred. A total
of 1604 and 1912 measurable and visible shrimps, respectively, were labeled in the training images. The ground truth of
shrimp body length was manually determined with linear distance of pixels from head to tail.

ASABE 2019 Annual International Meeting Page 3


Figure 3. Labeled images: (a) measurable, (b) visible (incomplete), and (c) visible (blurred).

Image augmentation was applied to the training images to enhance the robustness of the model to be trained. The
augmentation operations included rotation (randomly between 60° and 60°), saturation variation (randomly between 1 and
1.5), exposure variation (randomly between 1 and 1.5), and hue variation (randomly between 0.9 and 1.1).

CNN Models with Pre-trained Weights


You Only Look Once version 3 (YOLOv3; Redmon & Farhadi, 2018; Fig 4a) was used for detecting shrimps in the
images. A YOLOv3 model was comprised of a feature extractor and a predictor. In this work, Darknet-53 was used as the
feature extractor. Darknet-53 was composed of 6 convolutional layers (C1 to C6; Fig. 4b) and 5 convolutional blocks (CB1
to CB5; Fig. 4c). C1 through C6 were composed of a convolutional layer with filters of 3 × 3 pixels. Down sampling were
applied to the feature maps output by C2 to C5. CB1 through CB5 contained 1, 2, 8, 8, and 4 residual structures (Fig. 4d).
Each residual structure was composed of 2 convolutional layers with filters of 1 × 1 and 3 × 3 pixels, respectively, and a
shortcut connection. The size of the feature map output by Darknet-53 was one-thirty-second the size of the input image.

Figure 4. Architecture of YOLOv3. conv: convolutional layer, C: convolutional layer with LeakyReLu and BN, CS: convolutional set, RS:
residual structure, CB: convolutional block, n: size of the convolutional layer, d: dimension of the convolutional layer.

ASABE 2019 Annual International Meeting Page 4


The predictor predicted bounding boxes of the shrimps in the images using the feature maps from the feature extractor.
In this study, feature pyramid network (FPN; Tsung-Yi Lin et al., 2017) was used as the predictor. The FPN had 3 passes.
Pass 1 was composed of a convolutional set (CS1; Fig. 4e) and 2 convolutional layers (C8 and conv1). Pass 2 was composed
of 2 convolutional set (CS1 and CS2), and 3 convolutional layers (C7, C10, and conv2). Pass 3 was composed of 3
convolutional set (CS1 to CS3), and 4 convolutional layers (C7, C9, C11, and conv3). A convolutional set was composed of
convolutional layers with filters of 1 × 1, 3 × 3, 1 × 1, 3 × 3, and 1 × 1 pixels in sequence. C8, C10, and C11 were composed
of a convolutional layer with filters of 3 × 3 pixels. Conv1 to conv3, C7, and C9 were composed of a convolutional layer
with filters of 1 × 1 pixel. Leaky rectified linear unit (LeakyReLU; Bing Xu et al., 2015) was used as the activation function
in all convolutional layers (except conv1 to conv3), followed by batch normalization (BN; Ioffe & Szegedy, 2015). Each
pass of FPN predicted 3 bounding boxes. Each bounding box contained 7 parameters: the width and height of the box (tw
and th), the center coordinate of the box (tx and ty), an objectness probability (Pb), and 2 class probabilities (Pc1 and Pc2).
The training of the YOLOv3 model was as the follows. The model was first initialized with the parameters pre-trained
using the ImageNet dataset (Deng et al., 2009). The parameters of the model were then trained for 5000 iterations. In each
iteration, 32 mini-batches, each of which contained 4 images, were used to update the model parameters. The initial learning
rate, momentum, and weight decay were set to 0.001, 0.9, and 0.0005, respectively. Step gradient descent (SGD) were used
as the optimizer. The steps of SGD were set to 500, 1000, 2500, 4000, and 4500, and the corresponding scales were set to
2.0, 0.5, 0.1, 0.1, and 0.1, respectively. Sum of squared errors was used as the localization loss for the bounding box offset
predictions (tw, th, tx, and ty). Logistic regression was used as the loss for the objectness probability (Pb). Binary cross-entropy
was used as the classification loss for conditional class probabilities (Pc1 and Pc2). The model was trained using an open-
source python environment (Van Rossum, 1995) and deep learning libraries Darknet (Redmon, 2013). A graphic processing
unit (GeForce GTX 1080ti, NVIDIA; Santa Clara, USA) was used to expedite the training of the model.

The measurement of the shrimp body lengths


The body lengths of the measurable shrimps were measured using image processing. In the procedure, an area of the
measurable shrimp (Fig. 5a) detected by the YOLOv3 model was cropped from the images. The area was blurred using a
median filter of 7 × 7 pixels (Fig. 5b). Then, a MSR algorithm was used to eliminate the influence of uneven illumination
in images (Fig. 5c). The contrast of area was improved using a gamma correction with power of 4.5 (Fig. 5d). The area was
next binarized (Fig. 5e) using Otsu’s method (Otsu, 1979). Objects in the binarized area were then identified using
connected-component labeling (Wu et al., 2009). The object with the largest number of pixels was then determined as the
shrimp (Fig. 5f). Finally, a rectangle bounding box of the minimum area enclosing the shrimp was found (Fig. 5g). The
shrimp body length was determined as the long side of the bounding box.

Figure 5. Shrimp image processing: (a) patch image with measurable shrimp, (b) median blur, (c) MSR algorithm, (d) gamma correction, (e)
Otsu’s method, (f) the largest connected-component, and (g) image with bounding box.

Results
The performance of the shrimps detection
The performance of the developed YOLOv3 model was examined using the 100 test images (Table 1). Using a threshold
of 0.5 for the class probability, the model achieved a mean average precision (mAP) of 85.08%. Category “measureable”
achieved higher precision, recall, F1-score and average precision compared with the category “visible”. This results may be
caused by the fact that the features of visible shrimps were not complete. Some of visible shrimps were blurred, and the
bodies of some other visible shrimps were incomplete. These facts made it difficult for YOLOv3 to learn the features of the
“visible” shrimps.
Table 1. Performance of YOLOv3 model.
Class Precision (%) Recall (%) F1-Score (%) AP (%) mAP (%)
Measurable 92.38 90.02 91.19 88.13
85.08
Visible 82.66 81.61 82.13 82.03

The performance of the shrimp body lengths measurement


The performance of the body length measurement of the shrimps was examined using 150 cropped images (Fig. 6).
The proposed approach had a root mean squared error (RMSE) of 5.76%, and a standard deviation of 53.22 pixels. This

ASABE 2019 Annual International Meeting Page 5


results may be caused by the fact that the brightness of image was uneven, making it difficult to segment the images of the
shrimps from the background using image processing (Fig. 7).

Figure 6. Performance of shrimp body length measurement.

Figure 7. Result images with uneven background: (a) binary image, and (b) image with bounding box.

Conclusion
This study proposed to detect shrimps in complex images using YOLOv3 and to measure the body lengths of the shrimps
using image processing. In the study, shrimp images were acquired using underwater cameras with infrared light source.
YOLOv3 was developed to detect and locate shrimps in the images. The body lengths of shrimps were next estimated using
image processing algorithms. The proposed approach achieved a mAP of 85.08% in shrimp detection and localization, and
an RMSE of 5.76% in shrimp body length estimation.

References
Babakhani, P., & Zarei, P. (2015). Automatic gamma correction based on average of brightness. 2015, 4 %J Advances in
Computer Science : an International Journal.
Chan, F. H. Y., Lam, F. K., & Hui, Z. (1998). Adaptive thresholding by variational method. IEEE Transactions on Image
Processing, 7(3), 468-473. doi:10.1109/83.661196
Christiansen, P., Nielsen, L. N., Steen, K. A., Jørgensen, R. N., & Karstoft, H. (2016). DeepAnomaly: Combining
Background Subtraction and Deep Learning for Detecting Obstacles and Anomalies in an Agricultural Field.
16(11), 1904.
Deng, J., Dong, W., Socher, R., Li, L., Kai, L., & Li, F.-F. (2009, 20-25 June 2009). ImageNet: A large-scale hierarchical
image database. Paper presented at the 2009 IEEE Conference on Computer Vision and Pattern Recognition.
Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal
Covariate Shift. arXiv e-prints. https://ui.adsabs.harvard.edu/abs/2015arXiv150203167I
Liang, Q., Zhu, W., Long, J., Wang, Y., Sun, W., & Wu, W. (2018). A Real-Time Detection Framework for On-Tree Mango
Based on SSD Network, Cham.
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2016). Feature Pyramid Networks for Object
ASABE 2019 Annual International Meeting Page 6
Detection. arXiv e-prints. https://ui.adsabs.harvard.edu/abs/2016arXiv161203144L
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). SSD: Single Shot MultiBox
Detector, Cham.
Otsu, N. (1979). A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and
Cybernetics, 9(1), 62-66. doi:10.1109/TSMC.1979.4310076
Rahman, S., Rahman, M. M., Abdullah-Al-Wadud, M., Al-Quaderi, G. D., Shoyaib, M. J. E. J. o. I., & Processing, V.
(2016). An adaptive gamma correction for image enhancement. 2016(1), 35. doi:10.1186/s13640-016-0138-1
Rahman, Z., Jobson, D. J., & Woodell, G. A. (1996, 19-19 Sept. 1996). Multi-scale retinex for color image enhancement.
Paper presented at the Proceedings of 3rd IEEE International Conference on Image Processing.
Redmon, J. (2013-2016). Darknet: Open Source Neural Networks in C. http://pjreddie.com/darknet/
Redmon, J., & Farhadi, A. (2017, 21-26 July 2017). YOLO9000: Better, Faster, Stronger. Paper presented at the 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR).
Redmon, J., & Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv e-prints.
https://ui.adsabs.harvard.edu/abs/2018arXiv180402767R
Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards Real-Time Object Detection with Region
Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137-1149.
doi:10.1109/TPAMI.2016.2577031
Si, Y., Sankaran, S., Knowles, N. R., & Pavek, M. J. J. A. J. o. P. R. (2017). Potato Tuber Length-Width Ratio Assessment
Using Image Analysis. 94(1), 88-93. doi:10.1007/s12230-016-9545-1
Wang, Z., Walsh, K. B., & Verma, B. (2017). On-Tree Mango Fruit Size Estimation Using RGB-D Images. 17(12), 2738.
White, D. J., Svellingen, C., & Strachan, N. J. C. (2006). Automated measurement of species and length of fish by
computer vision. Fisheries Research, 80(2), 203-210. doi:https://doi.org/10.1016/j.fishres.2006.04.009
Wu, K., Otoo, E., Suzuki, K. J. P. A., & Applications. (2009). Optimizing two-pass connected-component labeling
algorithms. 12(2), 117-135. doi:10.1007/s10044-008-0109-y
Xiu, L., Min, S., Qin, H., & Liansheng, C. (2015, 19-22 Oct. 2015). Fast accurate fish detection and recognition of
underwater images with Fast R-CNN. Paper presented at the OCEANS 2015 - MTS/IEEE Washington.
Xu, B., Wang, N., Chen, T., & Li, M. (2015). Empirical Evaluation of Rectified Activations in Convolutional Network.
arXiv e-prints. https://ui.adsabs.harvard.edu/abs/2015arXiv150500853X
Yadav, G., Maheshwari, S., & Agarwal, A. (2014, 24-27 Sept. 2014). Contrast limited adaptive histogram equalization
based enhancement for real time video system. Paper presented at the 2014 International Conference on Advances
in Computing, Communications and Informatics (ICACCI).
Zhao, J., & Qu, J. (2019). Healthy and Diseased Tomatoes Detection Based on YOLOv2, Cham.
Zhou, C., Yang, X., Zhang, B., Lin, K., Xu, D., Guo, Q., & Sun, C. (2017). An adaptive image enhancement method for a
recirculating aquaculture system. Scientific Reports, 7(1), 6243. doi:10.1038/s41598-017-06538-9

ASABE 2019 Annual International Meeting Page 7

You might also like