Regionalization Peru 5thversion

Classification and Regionalization of rainfall over the Peruvian Pacific Coast
Pedro Rau1, Luc Bourrel1, David Labat1, Pablo Melo1, Boris Dewitte2, Frederic Frappart1,
Waldo Lavado3, Oscar Felipe3
1
UMR 5563 GET, Université de Toulouse – CNRS-IRD-OMP-CNES, 14 Avenue Edouard
Belin, 31400 Toulouse, France.
2
UMR 5566 LEGOS, Université de Toulouse - CNRS - IRD - OMP - CNES, 14 Avenue
Edouard Belin, 31400 Toulouse, France.
3
SENAMHI, Jirón Cahuide 785, Lima 11, Peru.
Corresponding author: Pedro Rau (pedro.rau@get.obs-mip.fr)
Abstract
The climate of the Peruvian coast is highly influenced by the Pacific atmospheric and oceanic
circulation. This region stands as one of the main economic zones in the country that
concentrates almost 50% of the population. Documenting the heterogeneity of precipitation
regimes is thus key for resources managements and mitigation of risks associated to extremes
weather events.
This study focuses on the definition of rainfall homogeneous regions over the Peruvian
Pacific coast. The approach is based on a two step process consisting first in a classification
and an iterative statistical methodology based on k-means clustering followed by a Regional
Vector Methodology (RVM). A network of 145 rainfall stations homogeneously spatially
distributed both meridionally and with altitude is used, which allows deriving a high-
resolution (0.5 km) gridded rainfall data product at an annual time-scale over the period 1964-
2011. Nine coherent regions identified by the iterative methodology are characterized. They
1
exhibit distinct rainfall seasonal variability that reflects the meridional transition of the
influence of the main climatic influences on the one hand and the interaction with the steep
topography on the other hand.
Over all, the results show the advantages of combining k-means clustering technique with the
Regional Vector Methodology (RVM) for regionalization purpose.
1. Introduction
Rainfall along South American coast is characterized by a complex pattern of spatial and
seasonal variability as a part of climate variability of this continent which exhibits a
considerable meridional extension and prominent topography (Garreaud et al., 2009). The
Peruvian Pacific coast is located at tropical latitudes and rainfall is mainly influenced by
ocean, atmosphere and orographic conditions because of narrow features of the Pacific
drainage basin.
This region concentrates more than 50 % of population of Peru and is also not well
documented in terms of rainfall regionalization. Recent works (Suarez, 2007; Lavado et al.,
2012; Bourrel et al., 2014; Ochoa, 2014) mostly focused on principal stations or major
watersheds, where the main cities are located. This is the motivation of our paper to consider
the rainfall regionalization as a decomposition of a large complex narrow area into smaller
homogeneous regions for research and applications in climatology and hydrology for this
important region.
This complex situation leads us to propose a method to determine homogeneous rainfall
regions as well as to identify and analyze its climatic behavior in the study area. In 1999, a
technical report (BCEOM - SOFI Consult - ORSTOM, 1999) proposed a rainfall
regionalization for Peruvian Pacific coast based on the Regional Vector Methodology
2
(Brunet-Moret 1979). In this report, nine regions were delineated mainly located in the
northern coast.
Multivariate analysis techniques have proved their efficiency to delineate homogeneous
regions based on climatic features such as rainfall data. Many authors have used factor
analysis, principal components, clustering techniques or a mixture of them, to define more
precisely climatic zones or rainfall regions (Ünal et al., 2003; Raziei et al., 2008) to classify
rainfall stations (Stooksbury & Michaels, 1991; Jackson & Weinand, 1995), or for analyzing
rainfall variability or distribution patterns (Sneyers et al., 1989; Ramos, 2001; Muñoz-Diaz &
Rodrigo, 2004; Dezfuli, 2010). Recently Sönmez and Kömüşcü (2011) proposed a rainfall
reclassification for Turkey based on k-means methodology. These studies highlight the
benefit in using clustering methods for regionalization purpose although they present
differences on focus and results. They also indicate that minor differences between
methodologies are worthy of consideration when geographical and climatological
interpretation is undertaken (Jackson and Weinand, 1995).
We propose here to use k-means technique for deriving a primary classification of rainfall
considering that precipitation in this region is influenced by a complex of parameters that are
associated to non-linear processes. First, the region is characterized by a steep topography
influencing the mesoscale atmospheric circulation (Garreaud et al., 2009). Second, the main
climatic influence over rainfall of Peru is associated to the El Niño Southern Oscillation
(ENSO) phenomenon which is characterized by a strong positive asymmetry (An and Jin,
2004; Boucharel et al., 2011). These phenomena thus relate regionally to rainfall over Peru in
a way that may not be well captured by linear techniques.
We propose here a joint method based on k-means cluster analysis and Regional Vector
Methodology (RVM) to define homogeneous regions using an iterative delineation process
and statistical criteria for merging of rainfall data. After spatial delimitation consisting in
3
refining the limits of homogeneous regions by a rainfall co-kriging interpolation, a regional
characterization of rainfall was carried out to describe their principal features (i.e. the annual
and monthly precipitation values, regime, distribution and altitudinal ranges). In the last part,
we also document the interannual variability of the rainfall temporal distribution over the
defined regions.
2. Data
2.1 Study area
The study area comprises the Pacific coastal region of Peru that covers an area of ~280,500
km². This region borders the Andes mountains by the east (69.8° W), while extending west to
the Pacific Ocean (81.3° W). It borders with Ecuador in the north (3.4° N) and with Chile in
the south (18.4° S). Its maximum width, perpendicular to the coastline is 230 km in the
southern part and is reduced to 100 km in the northern part. This area is characterized by a
significant altitudinal gradient ranging from 0 to ~ 6500 m.asl. This area includes 53 main
river watersheds that cover near the 90 % of this region. The rivers generally flows from east
to west from the Andes towards the Pacific Ocean with bare and steep slopes that favor
significant rising, flooding and erosion during highly rainy episodes (Lavado et al., 2012).
The Peruvian hydroclimatic system is also influenced by the Andes Cordillera, contrasting
oceanic boundary conditions and landmass distribution (Garreaud et al., 2009) which describe
much of its seasonal and interannual rainfall variability. This region shows greater rainfall
variations than the two main others hydrological regions of Peru: the Amazonas and the
endorheic Titicaca drainages (Lavado et al., 2012).
2.2 Rainfall data set
4
The database includes monthly rainfall records from 139 meteorological stations managed by
the SENAMHI (Servicio Nacional de Meteorologia e Hidrologia del Peru) and 6
meteorological stations managed by the INAMHI (Instituto Nacional de Meteorologia e
Hidrologia del Ecuador). It rapidly appears necessary to extend the area into the foothills of
the northern Andes, which cover bi-national river watersheds between Peru and Ecuador.
Monthly rainfall data covers the 1964 – 2011 period. Over the 145 stations, 124 stations are
located over Pacific coastal region of Peru (see Figure 1) and 11 belong to the Peruvian
Atlantic drainage and 4 to the Titicaca drainage. The data over this period was carefully
assessed in quality by using the Regional Vector Methodology – RVM (Brunet-Moret, 1979).
Finally, this method allows extracting a consistent dataset with 76% of stations with more
than 45 years of continuous records, 20% of stations between 20 and 45 years of continuous
records and only 4% of stations between 15 and 20 years of continuous records.
5
Figure 1. Geographical distribution of stations in the Peruvian Pacific coast represented by the black line.
Rainfall record length of the stations is shown in graduated color. Stations with more than 15 years of records
were taken to perform this study. A Digital Terrain Model (SRTM – 90 m) shows the topographical
characteristics and altitudes in the study area.
3. Methods
The Figure 2 is a schematic summarizing the applied method. The methodology comprises
three steps: the first one relies on the data preparation which includes a reviewing,
homogenization, and completion of monthly rainfall data; the second is the regionalization
6
process including a clustering and regional vector analysis; and the last step involves a
detailed characterization of the defined regions.
Peruvian Data
Pacific preparation, Clustering
Monthly Monthly Rainfall Process
homogenization
Rainfall Data database 145 stations (k-means)
and validation
Regional Vector
Digital Elevation Annual Rainfall Rainfall Analysis of
Model (DEM) Interpolation Spatialization predefined
90-m (Co-kriging) clusters
Validated rainfall regions

Region boundaries
by Regional Vector
definition Methodology (MVR)
Characterization of
Rainfall Patterns
by regions
Figure 2. Methodology schema applied for rainfall regionalization of the Peruvian Pacific region
3.1 Data preparation, homogenization and validation
It was carried out in three steps:
1) The analysis period was chosen to be as long as possible for a significant number of
stations over the Pacific Peruvian coast and extensions explained in section 2.2. We
also impose that the selected stations should have at least continuous records longer
than 15 years.
2) To evaluate the homogeneity of datasets for identifying inconsistent information in
terms of quality issues as: station microenvironment, instrumentation, variations in
time and position (Changnon and Kenneth, 2006); it was used the RVM analysis
7
(Brunet-Moret, 1979). It relies on the principle of pseudo proportionality rainfall index
calculated from the values of neighboring rainfall stations that characterize a
homogeneous rainfall pattern of a predetermined area. The principle of RVM is based
on the calculation of extended rainfall vector within the study period. This concept
refers to the calculation of a weighted average of precipitation anomalies for each
station, overcoming the effects of stations with extreme values of rainfall or which
have a small data record. With the prior antecedents, the regional annual pluviometric
indexes Zi and the extended average rainfall Pj are found by using the least squares
technique. This could be obtained by minimizing the sum of Equation (1),
𝑃𝑖𝑗
∑𝑁 𝑀
𝑖=1 ∑𝑗=1 ( − 𝑍𝑖) (1)
𝑃𝑗
where i is the year index, j the station index, N the number of years, and M the number
of stations. Pij stands for the annual rainfall in the station j, year i; Pj is the extended
average rainfall period of N years; and finally, Zi is the regional pluviometric index of
year i. The complete set of Zi values over the entire period is known as “regional
annual pluviometric indexes vector”. Being an iterative process, this method allows to
calculate the vector of each of the predefined regions (RV), then provides a stations –
vector behavior comparison, for finally discards those that are not consistent with the
regional vector (RV). This process is repeated as much as necessary. Therefore, a
“regional vector” (RV) is related for each defined region, and it represents the
behavior of all the stations which are part of the region. The calculated vector could be
considered as a suitable index of the climatic variability in the region.
3) For those stations that passed the homogenization process and also had missing
monthly data, once their spatial representation proved significant, were subjected to a
process of information completion. In this case, this procedure was performed using
the values of rainfall index calculated from the RV and the mean value of rainfall
8
monthly data of the concerned station. A more detailed description can be found in
Bourrel et al. (2014).
Through these three stages, 145 pluviometric stations were validated. The geographical
location of the 124 Peruvian Pacific coastal stations is depicted in Figure 1, which also
mentions the rainfall record length for each station.
3.2 Classification and Regionalization Process
3.2.1 K-means clustering technique
K-means cluster analysis is a tool designed to assign objects to a fixed number of groups
(clusters) based on a set of specified variables. It is a commonly used technique for
classifying a large amount of data. For example, Sönmez and Kömüşcü (2011) proposed a
new reclassification of rainfall regions over Turkey by using k-means methodology. Dezfuli
et al. (2010) suggested a rainfall regionalization based on k-means technique coupled with
Principal Components Analysis. Ramachandra Rao and Srinivas (2006) use k means
technique as part of a hybrid clustering test to identify groups of similar catchments based in
flow data. One of the principal advantages of k-means technique consists in its cluster’s
identifying performance which allows ranking the obtained clusters as a function of their
representativeness. The process involves a partitioning schema into k different clusters
previously defined. Objects that are within those k clusters must be as similar as possible to
those that belongs to its own group and completely dissimilar to the objects that are in the
other clusters. Similarity depends on correlation, average difference or another type of
metrics. By definition each cluster is characterized by its own centroid with the cluster
members located all around it. According to Sönmez and Kömüşcü (2011), basically a k–
means clustering process must take into account three principal steps:
9
a) A k objects selection will be randomly performed among the whole group of data, where
each k object represents the centroid of each k cluster. b) All the objects in the group will be
compared with its centroid based on a similarity metric previously defined. c) Each cluster
will have all the objects with a calculated similarity bigger than others. Every time an object
is integrated to any group, the centroid is recalculated immediately. The whole procedure is
an iterative process that continues until all the objects finally belong to a particular cluster.
The assignment of objects to different groups is quite well executed through the k means
algorithm since the intracluster similarities are strengthened while the intercluster
dissimilarities are maximized.
A key part of the k-means application is to define an optimum number of clusters. In order to
succeed in the definition of partitioning groups, an estimation of the silhouette number must
be performed for each desired number of groups. As stated by Kaufman and Rousseeuw
(1990) the silhouette value is calculated by the following equation (2):
min{𝑏(𝑖,𝑘)}−𝑎(𝑖)
𝑆(𝑖) = (2)
max⁡{a(i),min(b(i,k))}
Where: a (i) corresponds to the average similarity between the ith object and the other objects
of the same group and b (i,k) is the average similarity between the ith object and the members
of the kth clusters. The range of variation for this silhouette index is between -1 and +1, when
the silhouette value is close to +1 means that there is a better member correspondence to its
own cluster, while a negative value represents the object this is not well located in the
appropriate cluster. Meanwhile the value of 0 means that objects could belong to any k
cluster. There is also computed an average silhouette width for the hole k clusters which
represents the mean of S(i), and it can be used to choose the best number of clusters, by
taking the value of k for which S(i) is maximal.
3.2.2 Regionalization Analysis
10
After k-means clustering, regionalization was conducted by Regional Vector
Methodology (RVM), which is generally oriented to: a) rainfall regionalization processes
(establishment of representative vectors of homogeneous rainfall zones) and b) to assess
rainfall data quality based on the homogeneity within a predetermined region (Espinoza et al.,
2009). The process for regionalization is similar as the process explained in section 3.1. It
depends on a determination of a “mean station” or “vector” from all data involved in the study
area that will be compared with each pluviometrical station (Brunet-Moret 1979). Prior to use
the RVM, it is necessary to define those regions whose stations will be validated. There are
different ways to predefine regions. This definition can be based on geographical patterns or
topographical constraints related to isohyets, or based on rainfall stations clusters. Here,
rainfall stations clusters are set as predefined regions. Once calculated, the RV is compared
iteratively with data station for discarding those stations whose data are not consistent with
the RV and reprise the process. On several occasions the rejection of a given station could
mean that this station belongs to a neighboring region that could present greater consistency.
Therefore in many cases, stations or areas are re-grouped or divided in order to obtain regions
that may show homogenous features. It should be noted that the RV mainly represents the
behavior or climatic regime of a given region. The statistical main criteria for regrouping
stations into homogeneous regions are considering a standard deviation less than 0.4 and a
correlation coefficient greater than 0.7 between RV and stations. Rainfall database
management and RVM were carried out using the software HYDRACCES (Vauchel, 2005).
3.2.3 Rainfall data interpolation
In order to define region delineations, a rainfall spatial distribution combined to topographic
features was considered. Annual rainfall was interpolated incorporating elevation data using a
geostatistical approach. Geostatistical techniques have proven to be quiet efficient in data
prediction by minimizing estimation variances, and its use are widely extended in the
11
hydrometereological field (Dingman et al., 1988). Many authors consider that optimal
interpolation techniques based in geostatistical approaches (i.e. Kriging) gives better
estimations of rainfall distribution than classical methods as Inverse Distance Weighted or
Thiessen Polygons (Phillips et al., 1992; Tabios and Salas, 1985). Moreover, one of the
principal differences between classical methods and kriging is that the latter is based on the
so-called semivariogram, which depicts the spatial autocorrelation of the measured sample
points (Tabios and Salas, 1985). Cokriging, which is a multivariate version of kriging
technique, takes into account correlated secondary information (i.e. digital elevation models
DEM) (Goovaerts, 2000). For example, Hevesi et al. (1992a, 1992b) and Daly et al. (1994)
consider that, in mountainous regions, precipitation tends to be increased as altitude rises, and
it is mainly associated to orographic effect. In this research cokriging was chosen as
interpolation method and a DEM with a spatial a resolution of 90 m, provided by NASA-
NGA, Shuttle Radar Topographic Mission (SRTM) data
(http://srtm.csi.cgiar.org/SELECTION/ inputCoord.asp) was considered as secondary variable
or as correlated predictor using the universal co-kriging methodology (Buytaert et al., 2006;
Diodato, 2005) based on a spherical variogram which is widely used in rainfall interpolation
studies (Goovaerts, 2010; Mair et al., 2011). For cokriging, calculation was performed using
the Geostatistical module available in ArcGis 10.2 and reviewed with an R script.
This rainfall interpolation map was used for regional delineation considering the shape of
isohyets with a geometrical approach (perpendicular and bisector criteria of limits traversing
isohyets and stations) and a statistical approach (revalidation of new defined areas with RVM
with proper fit of stations inside each region).
4. Results
4.1 Initial Rainfall Classification
12
A cluster analysis of the precipitation data was performed by applying k-means technique on
the 124 rainfall stations previously selected. The optimal value for the cluster numbers was
determined by average silhouette value and negative silhouette number for cluster numbering
varying from 3 to 10 (Table 1).
Maximum silhouette values are obtained for cluster-three (0.64), cluster-four (0.60) and
cluster-six (0.55), considering as a reasonable structure a cluster having a silhouette value
greater than 0.50 and as a weak structure a silhouette value less than 0.50 following
Kononenko and Kukar (2007). The number of negative silhouette values is minimal for
cluster-three (6), cluster-four (4) and cluster-six (6). After plotting the cluster groups into a
map showing their spatial distribution, we select the cluster-three and cluster-six from them;
these two clusters show some signs about rainfall classification according to topographical
and latitudinal variation (Figure 3.a and 3.b). Cluster-four was an intermediate group that
corresponds to one sub-region in the north.
Table 1. Results of the K-means analysis for number of clusters varying from 3 to 10.
Number of Clusters 3 4 5 6 7 8 9 10
Average Silhouette Value 0.64 0.60 0.54 0.55 0.54 0.54 0.46 0.45
Negative Silhouette Number 6 4 9 6 8 6 11 9
The two cluster groups (cluster-three and six) exhibit a similar spatial distribution.
Pluviometric stations from both groups present an altitudinal distribution along the coast,
defining three regions: the stations located in lowlands (green triangles), in middle watersheds
(white circles) and in highlands (black points). Cluster-six group presents three additional
regions, two of them closely related to northern precipitation features for the middle-
13
watershed (cluster 4 represented by red triangles) and highlands (cluster 6 represented by
yellow circles). Two stations are considered as isolated (cluster 5 represented by blue circles).
a) b)
Figure 3 a) Spatial distribution of cluster-three group after the k-means process. Silhouette value for each cluster
group is also shown in the graph below. b) Idem for cluster-six group after the k-means process.
14
Even if cluster-six group appears less representative than cluster-three group in terms of
silhouette value, cluster-six group is considered acceptable for represent correctly the
behavior of northern precipitation, offering an initial classification of rainfall regime along the
Pacific Peruvian coast.
4.2 Regionalization
After cluster definition, a Regional Vector analysis was performed over these preliminary
regions as a first step of regional refining procedure in an iterative process adding and
deleting stations from regions considering the criteria described in section 3.2.2 and the
coefficient of variation (CV) of stations. In Figure 4, the Group 1 located in the western area
of the coast (lowlands), presents greater values of CV (> 1.8) are reported than those which
are located in middle watersheds and in highlands. Northern region presents higher CV values
in lowlands and in the middle watersheds. Highlands present lower CV values (< 0.8) along
the coast independently of the latitude.
15
Figure 4. Spatial distribution and range of coefficient of variation (CV) for all of the pluviometric stations
network of the Peruvian Pacific Coast.
High CV values in the northern region correspond to strong variability of the rainfall (> 1000
mm/yr). High CV values are also observed along southern latitude. They are mostly caused by
small fluctuations around the near zero annual average. These fluctuations are due to the
large-scale mid tropospheric subsidence over the southeastern subtropical Pacific Ocean,
enhanced by the coastal upwelling of cold water (Lavado et al., 2012; Garreaud et al., 2002).
Based on the iterative process of RV reanalysis of the clusters obtained using k-means
methodology, we identify nine homogeneous rainfall patterns (see Figure 5). Rainfall stations
16
from clusters 1, 2 and 4 located in the coastal zone and in northern Andes (see Figure 3.b)
exhibit higher coefficients of variation in indirect relationship with the proximity of the
highlands. Cluster 1 includes the regions 1, 4 and 7, showing this division along the coastal
zone. Cluster 4 defines region 2, in this case clustering process successfully assigned each
station as well as RV reported them as separate from other regions. Cluster 5 and 6 are
regrouped into region 3. Finally, Cluster 3 defines regions 5, 6, 8 and 9; in this case the low
variability as the latitudinal extension defines these four regions.
Following the schematics proposed in Figure 2, the spatial approach was necessary for
delineate geographical boundaries, regionalizing finally in this way the previous classification
obtained by k-means clustering and regionalization obtained by the statistical approach of
both k-means and RVM. For this step, an interpolated surface of annual rainfall over the
period 1964-2011 was calculated using co-kriging methodology considering topographical
features as explained in section 3.2.3. Annual rainfall features exhibits a relationship with
altitude and latitude, rainfall is higher at low latitudes and at southern latitudes in high
altitudes as showed in Figure 6. After knowing rainfall features, a spherical semivariogram
model was used for the kriging rainfall interpolation before the adjustment with the DEM.
Applying the methodology described in section 3.2.3, the nine regions were well delineated
taking into account the rainfall interpolation map as showed in Figure 5.
Correlation coefficient between the stations and the regional vector of each region was
calculated separately and the spatial distribution of these coefficients of correlation is shown
in Figure 7. The purpose of this analysis is to emphasize the level of representation of the
regional vector and identify locally the areas within a region where this vector is more
representative. Considering regions 4 and 7, the coefficient of correlation is less than 0.7 and
greater than 0.5. These coefficients are considered as acceptable considering the dryer
conditions with more than 90% of the records near 0 mm of rainfall throughout the year due
17
to hydroclimatic features, where any value greater than 0 mm causes a strong variability
reducing the relationship with its RV. For the northern regions 1 and 2, the mean correlation
is more than 0.9 being a very good representation of RV and the more representative areas are
showed in red coloration. Regions 3, 5, 6, 8 and 9 located in highland, have correlations
greater than 0.7 being a good representation of the RV with the more representatives areas in
orange coloration
18
Figure 5. The nine homogeneous rainfall regions after the regionalization process of clustering and
RVM. Interpolated surface of annual rainfall (isohyets obtained using cokriging method) is also
showed to demonstrate rainfall differences between regions.
19
Total annual average rainfall (mm/y)
1200
R3
1000
800 y = 52.73x + 1167.3

R² = 0.5836
R9
600
R5
400 R6
R2
y = 21.802x + 339.73 R8
R² = 0.5251
200
R1 R4 R7
0
-4.0 -6.0 -8.0 -10.0 -12.0 -14.0 -16.0 -18.0
Latitude (degrees)
Upstream regions Downstream regions
Figure 6. Relationship between total annual rainfall for nine regions versus latitude, grouped in
upstream and downstream regions
20
Figure 7. Coefficient of correlation related to the regional vector recalculated for each final region. A
global value of correlation is also showed by region in bold as well as spatial distribution of
correlation with the regional vector.
21
3.2 Regions Characterization
Region 1 extends over northern lowlands, from 4.2°S to 7.3°S, covering an area of ~ 20,300
km2. It corresponds to a range of altitudes which varies between 0 m and 500 m asl. Average
annual rainfall for this region is about 90 mm·yr-1 including drier areas as the Sechura desert.
A maximum monthly rainfall is observed in March (see Figure 8.a.1) with precipitations less
than 50 mm.month-1 that represents near to 90% of the annual rainfall, showing the unimodal
behavior of rainfall regime. The rest of the year is considered as drier due to precipitations
near or equal to zero mm, corroborating the monthly intermittency of rainfall regime in the
coast (Garreaud et al., 2002; Lavado et al., 2012). Region 2 comprises an area of ~ 27,600
km2, characterized by a middle watershed altitudinal gradient ranging from 0 m to 1500 m asl
and latitudinal variation from 3.4°S to 7.3°S. A large part of this area belongs to the foothills
of the northern Andes without considering necessarily a political border between bi-national
river watersheds of Peru and Ecuador. Then six rainfall stations from INAMHI
(Meteorological and Hydrological National Service of Ecuador) are included in the database.
This zone exhibits a monthly intermittent regime similar to a coastal region, mostly
influenced by oceanic and continental air masses (Buytaert et al., 2006; Takahashi, 2004).
The annual maximum amount of rainfall value is around 370 mm·yr-1. The wettest period
occurs between January and April (JFMA) cumulating near to 90% of total rainfall. Northern
coastal regions as regions 1 and 2 are significantly affected by strong events represented by
two peaks reaching 413 mm.month-1 in March 1983 and 299 mm.month-1 in March 1998 for
region 1 (see Figure 8.a.2) and 746 mm.month-1 in March 1983 and 708 mm.month-1 in
March 1998 for region 2 (see Figure 8.b.2). Most of this variability in rainfall, reflected too in
higher CV values (See Figure 4), is directly due to the presence of the El Niño Southern
Oscillation (ENSO) phenomenon (Wang and Fiedler, 2006), which is the one of the main
climate anomalies that drives hydroclimatic behavior in the coast of Ecuador and
northwestern of Peru (Lagos et al., 2008; Lavado et al., 2012; Bourrel et al., 2014) with its
22
climate mechanism associated as strong events (Horel and Cornejo-Garrido, 1986; Goldberg
et al., 1987; and Bendix and Bendix, 2006).
Region 3 covers the third part of northern area (~ 27,200 km2) including Ecuadorian stations.
This area extends from 3.6°S in the borders with Ecuador in the north to 8.3°S in the south
and limit integrally with the Amazon basin by the east. Actually, this is also the wettest region
(see Figure 8.c.1 and 8.c.2). This region corresponds to a zone of high altitudes varying from
1500 to 3500 m.asl and also shows a homogeneous rainfall regime. On the other hand, rainfall
amount decreases in the southern direction without showing intermittent characteristics.
Rainfall distribution is well defined with a rainy season from January to April (JFMA) that
represents near to 70% of the annual rainfall. Mean annual rainfall reaches 1024 mm.year-1,
representing five times of the mean annual rainfall of region 1 and 2. That corroborates the
effects of high altitudes with tropical Amazon influence, leading to an attenuation of the
effects of ENSO strong events as 1982/1983 and 1997/1998.
Region 4 is the longest region varying from latitudes of 7.3°S to 15.5°S located between the
coastal plain and the foothills of the western Andes; at the north it borders with Region 1
while to the south the region borders with Region 7. Covering an area of almost 48,600 km2,
this region contains some of the principal coastal cities as the capital Lima and have an
attitudinally range from 0 m.asl to 1500 m.asl. This region corresponds to a zone influenced
by the large-scale mid tropospheric subsidence of the southeastern subtropical Pacific Ocean,
enhanced by the coastal upwelling of cold water (Lavado et al., 2012; Garreaud et al., 2002).
Then, the rainfall regime reaches a mean value of 16 mm.yr-1 defining the driest region in the
country with the monthly intermittency characteristic for coastal regions. The wet period from
January to March (JFM) represents near to 75% of the annual regime. Due to the local
conditions it is possible to obtain a slight increase of rainfall in August being not
representative as a peak for the annual regime (see Figure 8.d.1). In the southern part are
23
founded drier areas as the Nazca desert. Region 5 covers ~ 32,500 km2. This area extends
from 7°S in the boundary with regions 2 and 3 in the north to the boundary with region 6 near
to 11°S in the south and in the boundary with region 3 and the Amazon basin by the east. The
mean annual rainfall reaches 492 mm.yr-1 and the wet period occurs between December and
April (DJFMA) cumulating near to 80% of total rainfall. There were not identified peaks as
the El Niño strong events, resulting in a homogeneous rainfall pattern (see Figure 8.e.2).
Altitudinal range varies according to the latitude from 1000 m.asl in the north and 2000 m.asl
in the south to 5000 m.asl. The narrow shape of the central area that covers regions 4, 5 and 6
does not define an intermediate region as in the north (regions 1, 2 and 3) and south (regions
7, 8 and 9) due to strong altitudinal variation and after RV procedure. It was not possible to
establish any intermediate region as proposed by cluster analysis.
Region 6 covers ~ 30,400 km2 and extends from 11°S in the boundary with region 5 in the
north to 15°S in the boundary with region 9 in the south and limits integrally with the
Amazon basin by the east. It is located in highlands varying from 2000 to 5000 m.asl of
altitude, showing a homogeneous rainfall regime as well. Rainfall distribution is well defined
with a rainy season from December to March (DJFM) that represents near to 85% of the
annual rainfall (See Figure 8.f.1) and mean annual rainfall reaches 366 mm.year -1. It is
impossible to distinguish in the rainfall temporal variability any peaks corresponding to the
strong El Niño events (See Figure 8.f.2).
Region 7 is the southern coastal region and extends from latitude 15.5°S to 18.4°S
approximately covering ~ 49,300 km2 varying from 0 to 2500 m.asl of altitude. It is extended
in the north with region 4 and with the Chilean border in the south. This region is
characterized by lower rainfall regime as a coastal region with a rainy season from January to
March (JFM), accounting for 65% of the annual rainfall. Furthermore, this region is one of the
driest areas in the country where the annual rainfall (23 mm.year-1) is recorded with a monthly
intermittency (see Figure 8.g.1 and 8.g.2). This region could be considered as an extension of
24
region 4, also influenced by the large-scale mid tropospheric subsidence of the southeastern
subtropical Pacific Ocean but differing in the development of regular events in the last decade
as can be seen in Figure 8. Region 7 presents a succession of peaks in the last decade,
contrary to region 4 where peaks are not visible.
Region 8 comprises an area of ~ 25,400 km2, characterized by a middle watershed altitudinal
gradient ranging from 2500 m to 4000 m asl and a latitudinal variation from 14.6°S to 17.8°S.
Its extension covers principally the boundary with region 6 in the north and with the Chilean
border and the Titicaca basin in the south. Although much of its area belongs to the foothills
of the southern Andes mountains. This zone exhibits a monthly intermittent regime as a
coastal region, that are mostly influenced by oceanic and continental air masses (Garreaud et
al., 2002). However, rainfall depth presents higher values than region 7 reaching 296 mm·yr-1.
The wettest period occurs between December and March (DJFM) cumulating near to 90% of
total rainfall (see Figure 8.h.1).
Finally, region 9 covers ~ 30,100 km2. This area extends from 14.4°S in the boundary with
region 6 in the north to the border with the Titicaca basin in the south and east around 17.7°S
and with the Amazon basin by the east. Altitudinal range varies from 3500 m to 5500 m.asl.
The mean annual rainfall reaches 594 mm.yr-1 and the wet period occurs between December
and March (DJFM) cumulating near to 80% of total rainfall. There were not identified peaks
as the strong El Niño events, resulting in a homogeneous rainfall pattern (see Figure 8.i.2).
The major characteristics of rainfall are summarized in Table 2 for each region and
represented as a box plot in Figure 9, where outliers are represented by small circles, and
correspond to values exceeding 1.5 times the interquartile range (IQR). All regions have
observations that exceed Q3 + 1.5(IQR), region 1 and region 2 northern coastal regions have a
greater number of anomalous values in comparison with other regions which are reflected too
in Figure 4.
25
R1 R2 R3
a.1) 300 b.1) 300
c.1) 300
250 250 250
200 200 200
P (mm)
P (mm)
P (mm)
150 150 150
100 100 100
50 50 50
0 0 0
a.2) S O N D J F M A M J J A b.2) S O N D J F M A M J J A c.2) S O N D J F M A M J J A
800 800 800

600 600 600
P (mm)
P (mm)
P (mm)
400 400 400
200 200 200
0 0 0
1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
R4 R5
d.1) 50 e.1) 300
40 250
200
P (mm)
P (mm)
30
150
20
100
10 50
0 0
d.2) S O N D J F M A M J J A e.2) S O N D J F M A M J J A
100 800
80 600
P (mm)
P (mm)
60
400
40
20 200
0 0
1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
R7 R6
g.1) 50 f.1) 300
40 250
200
P (mm)
P (mm)
30
150
20
100
10 50
0 0
g.2) S O N D J F M A M J J A f.2) S O N D J F M A M J J A
100 800
80 600
P (mm)
P (mm)
60
400
40
20 200
0 0
1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
R8 R9
h.1) 300 Figure 8. Rainfall regime (1964-2011) for i.1) 300
250 250
the nine identified regions. A rainfall time
200 200
series is shown by region,
P (mm)
P (mm)
150 150
100 100
50 50
0 0
h.2) S O N D J F M A M J J A i.2) S O N D J F M A M J J A
800 400
600 300
P (mm)
P (mm)
400 200
200 100
0 0
1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
26
Figure 9. Boxplot of monthly rainfall for the nine identified regions.
Table 2. Minimum, Maximum and Average of annual rainfall for the nine identified regions.
Annual
Annual Annual
Region Average Regime
minimum maximum CV Std Dev.
Rainfall
Rainfall (mm) Rainfall (mm)
(mm)
1 3.2 1345.2 89.7 2.6 233.3 Unimodal
2 17.3 2772.2 366.5 1.5 534.2 Unimodal
3 533.0 1812.9 1023.7 0.3 294.4 Unimodal
4 1.6 62.2 15.5 0.7 11.4 Unimodal
5 174.1 825.8 492.4 0.3 145.8 Unimodal
6 75.0 693.5 365.9 0.4 133.3 Unimodal
7 5.1 54.9 23.2 0.6 13.5 Unimodal
8 23.2 528.8 296.1 0.4 111.8 Unimodal
9 220.5 833.2 594.0 0.2 143.2 Unimodal
27
5. Conclusions
Rainfall fluctuations over the Peruvian Pacific coast exhibit a high variability at both spatial
and temporal scales. A method is proposed that allows defining nine homogeneous regions.
The approach is based on a two-step process consisting in a preliminary cluster analysis (k-
means) followed by a Regional Vector Methodology (RVM) analysis. K-means clustering
allows for an initial classification into three regions for lowlands, middle basins and
highlands. The method also highlights the complicated situation of the northern area where
three additional regions were delineated. A regional definition was further proposed based on
the results of the RVM applied to the inferred clusters. At last the delineation of the regions
and data density issues could be addressed based on a rainfall co-kriging interpolation.
The two northern coastal regions, region 1 and region 2 were very well represented by the
Regional Vector (RV), reflecting the strong El Niño events influence. Highland regions
(regions 3-5-6-8 and 9) were represented by the RV showing the homogeneous behavior of
rainfall without the strong El Niño influence reflecting a low coefficient of variation. On the
contrary, coastal lowland regions (regions 4 and 7) are characterized by an acceptable
representation by the RV reflecting the drier conditions along the coast due to upwelling
conditions. The monthly seasonal cycle of rainfall in the southern regions (region 7, region 8
and region 9) exhibits differences with the rest of regions, with in particular a shift by one
month for maximum rainfall. Rainfall peaks in February for region 7 and region 8 and
January for region 9 whereas it peaks in March as for the others regions. Such heterogeneity
in temporal and spatial variability will be discussed in a future research considering the
hydroclimatic approach. Overall we have provided here a regional analysis that can be used in
future researches for the study of relationship between rainfall variability at local scales and
some aspects of the regional oceanic and atmospheric circulation in Peru.
28
6. Acknowledgments
This work was supported by Peruvian Ministry of Education (MINEDU-PRONABEC,
scholarship). Authors would like to thank SENAMHI (Meteorological and Hydrological
Service of Peru) for providing complete rainfall raw dataset.
References
An S.I, Jin F.F. 2004. Nonlinearity and asymmetry of ENSO. Journal of Climate 17:2399–
2412.
Bendix A, Bendix J. 2006. Heavy rainfall episodes in Ecuador during El Niño events and
associated regional atmospheric circulation and SST patterns. Adv. Geosci 6:43–49.
BCEOM. 1999. Estudio hidrológico-meteorológico en la vertiente del Pacífico del Perú con
fines de evaluación y pronóstico del fenómeno El Niño para prevención y mitigación de
desastres. Asociación BCEOM-Sofi Consult S.A. -ORSTOM, Programa de apoyo a la
emergencia Fenómenodel Niño. Contrato de préstamo n°4250-PE-BIRF, Presidencia de la
Republica, Perú. Volumen I.
Boucharel J, Dewitte B, Du Penhoat Y et al. 2011. ENSO nonlinearity in a warming climate.

Clim Dyn. doi 10.1007/s00382-011-1119-9.
Bourrel L, Rau P, Dewitte B et al. 2014. Low-frequency modulation and trend of the
relationship between ENSO and precipitation along the northern to centre Peruvian Pacific
coast. Hydrological Processes. doi: 10.1002/hyp.10247.
Brunet-Moret Y. 1979. Homogénéisation des précipitations. Cahiers ORSTOM. Serie Hydr
3–4.
Buytaert W, Celleri R, Willems P et al. 2006. Spatial and temporal rainfall variability in
mountainous areas : A case study from the south Ecuadorian Andes. Journal of Hydrology
329:413–421. doi:10.1016/j.jhydrol.2006.02.031.
Changnon S, Kenneth K. 2006. Changes in Instruments and Sites Affecting Historical
Weather Records : A Case Study. J. Atmos. Oceanic Technol 23:825–828.
Daly C, Neilson R.P, Phillips D.L. 1994. A Statistical-Topographic Model for Mapping
Climatological Precipitation over Mountainous Terrain. J. Appl. Meteor 33:140–158.
Dezfuli A.K. 2010. Spatio-temporal variability of seasonal rainfall in western equatorial
Africa. Theor. Appl. Climatol 104(1-2): 57–69. doi:10.1007/s00704-010-0321-8.
Dingman S.L, Seely-Reynolds D.M, Reynolds, R.C. 1988. Application of kriging to
estimating mean annual precipitation in a region of orographic influence. JAWRA Journal of
the American Water Resources Association 24(2): 329–339. doi:10.1111/j.1752-
1688.1988.tb02991.x.
29
Diodato N. 2005. The influence of topographic co-variables on the spatial variability of
precipitation over small regions of complex terrain. Int. J. Climatol 25:351–363. doi:
10.1002/joc.1131.
Enfield D.B. 1981. Thermally Driven Wind Variability in the Planetary Boundary Layer
Above Lima, Peru. J. Geophys. Res 86:2005-2016.
Espinoza J.C, Ronchail J, Guyot J.L, Cochonneau G et al. 2009. Spatio-temporal rainfall
variability in the Amazon basin countries (Brazil, Peru, Bolivia, Colombia and Ecuador). Int.
J. Climatol 29:1574–1594. doi:10.1002/joc.
Garreaud R, Rutllant J, Fuenzalida H. 2002. Coastal lows along the Subtropical West Coast of
South America: Mean Structure and Evolution. Mon.Wea. Rev 130:75-88.
Garreaud R.D, Vuille M, Compagnucci R, Marengo J. 2009. Present-day South American
climate. Palaeogeography, Palaeoclimatology, Palaeoecology. 281 (3–4):180–195.
Goldberg R.A, Tisnado G, Scofield R.A. 1987. Characteristics of extreme rainfall events in
north-western Peru during the 1982– 1983 El Niño period, J. Geophys. Res 92:C14 225–241.
Goovaerts P. 2000. Geostatistical approaches for incorporating elevation into the spatial
interpolation of rainfall. Journal of Hydrology 228:113–129.
Hevesi J, Flint A, Istok J. 1992. Precipitation estimation in mountainous terrain using
multivariate geostatistics. Part II: Isohyetal maps. J. Appl. Meteor. Climatol 31:677-688.
Hevesi J, Istok J, Flint A. 1992. Precipitation estimation in mountainous terrain using
multivariate geostatistics. Part I: structural analysis. J. Appl. Meteor. Climatol 31:661-676.
Horel J.D, Cornejo-Garrido A.G. 1986. Convection along the coast of northern Peru during
1983: Spatial and temporal variation of clouds and rainfall. Mon.Wea. Rev 114:2091–2105.
Jackson I.J, Weinand H. 1995. Classification of tropical rainfall stations: A comparison of
clustering techniques. Int. J. Climatol 15(9):985–994. doi:10.1002/joc.3370150905.
Kaufman L, Rousseeuw P. 1990. Finding Groups in Data: An Introduction to Cluster
Analysis. John Wiley & Sons, Inc, Hoboken.
Kononenko I, Kukar M. 2007. Machine learning and data mining: Introduction to principles
and algorithms. Horwood Publishing, Chichester.
Lavado W.S, Ronchail J, Labat D, Espinoza J.C, Guyot J.L. 2012. Basin-scale analysis of
rainfall and runoff in Peru (1969–2004): Pacific, Titicaca and Amazonas drainages.
Hydrological Sciences Journal 57 (4): 1–18.
Lagos P, Silva Y, Nickl E, Mosquera K. 2008. El Niño – related precipitation variability in
Peru. Adv. Geosci 14:231–237.
Mair A, Fares A. 2011. Comparison of rainfall interpolation methods in a Mountainous

Region of a Tropical Island. Journal of Hydrological Engineering 16(4): 371-383.
Muñoz-Diaz D, Rodrigo F. 2004. Spatio-temporal patterns of seasonal rainfall in Spain
(1912-2000) using cluster and principal component analysis: comparison. Ann. Geophys
1435–1448.
30
Ochoa A, Pineda L, Crespo P, Willems P. 2014. Evaluation of TRMM 3B42 precipitation
estimates and WRF retrospective precipitation simulation over the Pacific–Andean region of
Ecuador and Peru. Hydrol. Earth Syst. Sci 18:3179–3193, 2014.
Phillips D.L, Dolph J, Marks D. 1992. A comparison of geostatistical procedures for spatial
analysis of precipitation in mountainous terrain. Agricultural and Forest Meteorology.
doi:10.1016/0168-1923(92)90114-J.
Ramachandra Rao, Srinivas V.V. 2006. Regionalization of watersheds by hybrid-cluster
analysis. Journal of Hydrology 318(1-4): 37–56. doi:10.1016/j.jhydrol.2005.06.003.
Ramos M. 2001. Divisive and hierarchical clustering techniques to analyse variability of
rainfall distribution patterns in a Mediterranean region. Atmospheric Research 123–138.
Raziei T, Bordi I, Pereira L.S. 2008. A precipitation-based regionalization for Western Iran
and regional drought variability. Hydrol. Earth Syst. Sci doi:10.5194/hess-12-1309-2008.
Sneyers R, Vandiepenbeeck M, Vanlierde R. 1989. Principal component analysis of Belgian
rainfall. Theor. Appl. Climatol 204:199–204.
Sönmez İ, Kömüşcü A.Ü. 2011. Reclassification of rainfall regions of Turkey by K-means
methodology and their temporal variability in relation to North Atlantic Oscillation (NAO).
Theor. Appl. Climatol 106(3-4):499–510. doi:10.1007/s00704-011-0449-1.
Stooksbury D, Michaels P. 1991. Cluster analysis of southeastern US climate stations. Theor.
Appl. Climatol 150:143–150.
Suarez W. 2007. Le bassin versant du fleuve Santa (Andes du Pérou): dynamique des
écoulements en contexte glacio-pluvio-nival. Dissertation, Université Montpellier II.
Takahashi K. 2004. The atmospheric circulation associated with extreme rainfall events in
Piura, Peru, during the 1997—1998 and 2002 El Niño events. Ann. Geophys 22:3917-3926.
Tabios G.Q, Salas J.D. 1985. A Comparative Analysis of Techniques for Spatial Interpolation
of Precipitation. JAWRA Journal of the American Water Resources Association 21: 365–380.
doi:10.1111/j.1752-1688.1985.tb00147.x.
Ünal Y, Kindap T, Karaca M. 2003. Redefining the climate zones of Turkey using cluster
analysis. Int. J. Climatol 23:1045–1055.
Vauchel P. 2005. Hydraccess: Software for Management and processing of Hydro –
meteorological data software, Version 2.1.4. Free download
www.mpl.ird.fr/hybam/utils/hydracces.html.
Wang C, Fiedler P. 2006. ENSO variability and the eastern tropical Pacific: A review.
Progress in Oceanography 69:239-266.
31

Regionalization Peru 5thversion

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regionalization Peru 5thversion

Uploaded by

Copyright:

Available Formats

Classification and Regionalization of rainfall over the Peruvian Pacific Coast

Waldo Lavado3, Oscar Felipe3

Belin, 31400 Toulouse, France.

Edouard Belin, 31400 Toulouse, France.

Corresponding author: Pedro Rau (pedro.rau@get.obs-mip.fr)

concentrates almost 50% of the population. Documenting the heterogeneity of precipitation

and an iterative statistical methodology based on k-means clustering followed by a Regional

Vector Methodology (RVM). A network of 145 rainfall stations homogeneously spatially

topography on the other hand.

Regional Vector Methodology (RVM) for regionalization purpose.

seasonal variability as a part of climate variability of this continent which exhibits a

This complex situation leads us to propose a method to determine homogeneous rainfall

technical report (BCEOM - SOFI Consult - ORSTOM, 1999) proposed a rainfall

Multivariate analysis techniques have proved their efficiency to delineate homogeneous

analysis, principal components, clustering techniques or a mixture of them, to define more

methodologies are worthy of consideration when geographical and climatological

interpretation is undertaken (Jackson and Weinand, 1995).

associated to non-linear processes. First, the region is characterized by a steep topography

a way that may not be well captured by linear techniques.

Methodology (RVM) to define homogeneous regions using an iterative delineation process

2.1 Study area

endorheic Titicaca drainages (Lavado et al., 2012).

2.2 Rainfall data set

the SENAMHI (Servicio Nacional de Meteorologia e Hidrologia del Peru) and 6

meteorological stations managed by the INAMHI (Instituto Nacional de Meteorologia e

records and only 4% of stations between 15 and 20 years of continuous records.

characteristics and altitudes in the study area.

detailed characterization of the defined regions.

Validated rainfall regions

3.1 Data preparation, homogenization and validation

It was carried out in three steps:

2) To evaluate the homogeneity of datasets for identifying inconsistent information in

terms of quality issues as: station microenvironment, instrumentation, variations in

calculated from the values of neighboring rainfall stations that characterize a

homogeneous rainfall pattern of a predetermined area. The principle of RVM is based

refers to the calculation of a weighted average of precipitation anomalies for each

technique. This could be obtained by minimizing the sum of Equation (1),

regional vector (RV). This process is repeated as much as necessary. Therefore, a

considered as a suitable index of the climatic variability in the region.

Bourrel et al. (2014).

mentions the rainfall record length for each station.

3.2 Classification and Regionalization Process

3.2.1 K-means clustering technique

(clusters) based on a set of specified variables. It is a commonly used technique for

representativeness. The process involves a partitioning schema into k different clusters

other clusters. Similarity depends on correlation, average difference or another type of

dissimilarities are maximized.

(1990) the silhouette value is calculated by the following equation (2):

taking the value of k for which S(i) is maximal.

3.2.2 Regionalization Analysis

Methodology (RVM), which is generally oriented to: a) rainfall regionalization processes

(establishment of representative vectors of homogeneous rainfall zones) and b) to assess

topographical constraints related to isohyets, or based on rainfall stations clusters. Here,

3.2.3 Rainfall data interpolation

In order to define region delineations, a rainfall spatial distribution combined to topographic

geostatistical approach. Geostatistical techniques have proven to be quiet efficient in data

interpolation techniques based in geostatistical approaches (i.e. Kriging) gives better

estimations of rainfall distribution than classical methods as Inverse Distance Weighted or

it is mainly associated to orographic effect. In this research cokriging was chosen as

interpolation method and a DEM with a spatial a resolution of 90 m, provided by NASA-

NGA, Shuttle Radar Topographic Mission (SRTM) data

(http://srtm.csi.cgiar.org/SELECTION/ inputCoord.asp) was considered as secondary variable

with proper fit of stations inside each region).