Professional Documents
Culture Documents
EGU2020 21549 Presentation
EGU2020 21549 Presentation
05/05/2020 1
INTRODUCTION
05/05/2020 2
CONTEXT
- Self-describing format
- Multidimensional variables
- Native compression (zlib)
05/05/2020 4
EXTERNAL COMPRESSION
1500
1064
1000
500
0
Original Zlib lvl 9
Original Zlib lvl 9
05/05/2020 5
EXTERNAL COMPRESSION
05/05/2020 6
What can I do?
05/05/2020 7
EXTERNAL COMPRESSION
05/05/2020 8
EXTERNAL COMPRESSION
05/05/2020 9
TYPE OF DATA
05/05/2020 10
TYPE OF DATA
Reduce size per value
Name Size (bytes) Range
BYTE 1 -127 ... 128
UNSIGNED BYTE 1 0 … 255
SHORT 2 -32,768 … 32,767 A - % | Nebulosity (0 – 100)
B - ºC | Temperature (-10 – 40)
UNSIGNED SHORT 2 0 … 65,535
C - m/s | Wind Speed (0.xx – 400.xx)
INT 4 -2,147,483,648 …
2,147,483,647
UNSIGNED INT 4 0 … 4,294,967,295
INT64 8 −𝟐𝟔𝟑 … 𝟐𝟔𝟑 − 𝟏
UNSIGNED INT64 8 0 … 𝟐𝟔𝟒 − 𝟏
FLOAT 4 𝟑. 𝟒 ± 𝑬𝟑𝟖 * * Can represent
DOUBLE 8 𝟏. 𝟕 ± 𝑬𝟑𝟎𝟖 * decimals
05/05/2020 11
TYPE OF DATA
Size MB Performance (seconds)
1400 1216 1600
1400 1339,9
1200 1064
1000 1200 1030,2
800 700 1000
783
800
600
600
400
400
200
200
0 0
Zlib lvl 9 Zlib lvl 2 Data type Zlib lvl 9 Zlib lvl 2 Data type
Zlib lvl 9 Zlib lvl 2 Data type Zlib lvl 9 Zlib lvl 2 Data type
img_len = 64
list_rand = 6000 random values of time, lat and lon
for time, lat, lon in list_rand:
for var in [ ‘A’, ‘B’, ‘C’ ]:
data = NetCDF[var][time, lat + img_len, lon + img_len ]
acumulate_mean(var, data)
05/05/2020 12
TRANSFORMATION
05/05/2020 13
TRANSFORMATION
05/05/2020 14
TRANSFORMATION
Size MB Performance (seconds)
1400 1216 1600
1400 1339,9
1200 1064
1000 1200 1030,2
800 700 1000
783
553 800
600
600
400
400 252,6
200
200
0 0
Zlib lvl 9 Zlib lvl 2 Data type Packing Zlib lvl 9 Zlib lvl 2 Data type Packing
Zlib lvl 9 Zlib lvl 2 Data type Packing Zlib lvl 9 Zlib lvl 2 Data type Packing
img_len = 64
list_rand = 6000 random values of time, lat and lon
for time, lat, lon in list_rand:
for var in [ ‘A’, ‘B’, ‘C’ ]:
data = NetCDF[var][time, lat + img_len, lon + img_len ]
acumulate_mean(var, data)
05/05/2020 15
ACCESS PATTERN
05/05/2020 16
ACCESS PATTERN
05/05/2020 17
ACCESS PATTERN
05/05/2020 18
CHUNKING
05/05/2020 19
CHUNKING
05/05/2020 20
ITERATE BY VARIABLE
05/05/2020 21
ITERATE BY VARIABLE
img_len = 64
Performance (seconds) list_rand = 6000 random values of time, lat and lon
1600 for time, lat, lon in list_rand:
1400 1339,9 for var in [ ‘A’, ‘B’, ‘C’ ]:
1200
Not effective for this code data = NetCDF[var][time, lat + img_len, lon + img_len ]
1030,2 acumulate_mean(var, data)
1000
783
800
600
400 252,6
img_len = 64
200
10 9,9 10,2 list_rand = 6000 random values of time, lat and lon
0
Zlib lvl 9 Zlib lvl 2 Data type Packing Chunk Chunk Iter by var for var in [ ‘A’, ‘B’, ‘C’ ]:
1x64x64 1x64x72 for time, lat, lon in list_rand:
Zlib lvl 9 Zlib lvl 2 Data type Packing data = NetCDF[var][time, lat + img_len, lon + img_len ]
acumulate_mean(var, data)
Chunk 1x64x64 Chunk 1x64x72 Iter by var
05/05/2020 22
RANDOMNESS
05/05/2020 23
RANDOMNESS
img_len = 64
Performance (seconds) list_rand = 6000 random values of time, lat and lon
1600 for var in [ ‘A’, ‘B’, ‘C’ ]:
1339,9 for time, lat, lon in list_rand:
1400
1200 1030,2 data = NetCDF[var][time, lat + img_len, lon + img_len ]
1000 acumulate_mean(var, data)
783
800
600
400 252,6
200 10 9,9 10,2 7,4
0 img_len = 64
list_rand = 6000 random values of time, lat and lon
list_seq, permutation = order_list(list_rand)
for var in [ ‘A’, ‘B’, ‘C’ ]:
Zlib lvl 9 Zlib lvl 2 Data type Packing for time, lat, lon in list_seq :
Chunk 1x64x64 Chunk 1x64x72 Iter by var Randomness data = NetCDF[var][time, lat + img_len, lon + img_len ]
acumulate_mean(var, data)
05/05/2020 24
FINAL RESULTS
- Storage
- Original 1853 MB (100,00 %)
- Optimized 553 MB ( 29,84 %)
- Performance
- Original 1030,2s
- Optimized 7,4s (~x139 faster)
05/05/2020 25
CONCLUSIONS
• Find the best type of data for the values you will store.
• A bad chunking have a big impact in the performance.
• Choosing the best chunking is a complex problem.
• Always try to access the data as much sequential as
possible.
05/05/2020 26
• The « NetCDF: Performance and Storage Optimization
of Meteorological Data » contain more details and
reproducible example.
05/05/2020 27
Merci de votre attention
© IRT AESE ”Saint Exupéry” - All rights reserved Confidential and proprietary document. This document and all information contained
herein is the sole property of IRT AESE “Saint Exupéry”. No intellectual property rights are granted by the delivery of this document
or the disclosure of its content. This document shall not be reproduced or disclosed to a third party without the express written
consent of IRT AESE “Saint Exupéry” . This document and its content shall not be used for any purpose other than that for which it is
supplied. IRT AESE ”Saint Exupéry” and its logo are registered trademarks.
05/05/2020 28