Professional Documents
Culture Documents
Full download Applied Statistics for Environmental Science With R 1st Edition Abbas F. M. Alkarkhi file pdf all chapter on 2024
Full download Applied Statistics for Environmental Science With R 1st Edition Abbas F. M. Alkarkhi file pdf all chapter on 2024
https://ebookmass.com/product/applications-of-hypothesis-testing-
for-environmental-science-abbas-f-m-alkarkhi/
https://ebookmass.com/product/easy-statistics-for-food-science-
with-r-abdulraheem-alqaraghuli/
https://ebookmass.com/product/applied-statistics-with-r-a-
practical-guide-for-the-life-sciences-justin-c-touchon/
https://ebookmass.com/product/applied-statistics-theory-and-
problem-solutions-with-r-dieter-rasch-rostock/
Environmental Science For Dummies, 2nd 2nd Edition
Alecia M. Spooner
https://ebookmass.com/product/environmental-science-for-
dummies-2nd-2nd-edition-alecia-m-spooner/
https://ebookmass.com/product/innovation-strategies-in-
environmental-science-1st-edition-charis-m-galanakis-editor/
https://ebookmass.com/product/applied-medical-statistics-1st-
edition-jingmei-jiang/
https://ebookmass.com/product/environmental-science-for-ap-
second-edition/
APPLIED STATISTICS FOR ENVIRONMENTAL
SCIENCE WITH R
APPLIED
STATISTICS FOR
ENVIRONMENTAL
SCIENCE WITH R
ABBAS F. M. ALKARKHI
WASIN A. A. ALQARAGHULI
Elsevier
Radarweg 29, PO Box 211, 1000 AE Amsterdam, Netherlands
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
© 2020 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying,
recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek
permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright
Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in
research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods,
compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety
of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage
to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products,
instructions, or ideas contained in the material herein.
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
ISBN 978-0-12-818622-0
Abbas
To the memory of my parents (deceased)
To my children Atheer, Hibah, and Farah
Wasin
To the memory of my father (deceased)
To my mother
Preface
Applied statistics for environmental science with R was written in an easy style to introduce some statistical tech-
niques that are useful to students and researchers who work in environmental science and environmental engineering
to choose the appropriate statistical technique for analyzing their data and drawing smart conclusions. The explana-
tion of the R output is carried out in a step-by-step manner and in an easy and clear style to enable non-statisticians to
understand and use it in their research.
A step-by-step procedure is employed to perform the analysis and the interpretation of results by matching the
results to the field of study where the data were obtained. The book focuses on the applications of univariate and mul-
tivariate statistical techniques in the field of environmental science. Furthermore, real data obtained from research over
more than fifteen years of work in environmental science were employed to illustrate the concepts and analysis.
The book uses R statistical software to analyze the data and generate the required results. R is open source and
provides facilities to provide feedback and produce a high-resolution plot. Furthermore, it is easy to get online assis-
tance provided by various communities. R is available over the internet under the General Public License (GPL) for the
Windows, Macintosh, and Linux operating systems.
Finally, we wish to thank our families, friends, and colleagues for their continuous support. We would like to extend
our thanks to the R software community and R family (R users and contributors to R) providing the software for free.
This book would not have been possible without the information provided online, which is easy to obtain. We thank
the University of Kuala Lumpur (Unikl MICET) for its support.
Abbas
Wasin
ix
C H A P T E R
1
Multivariate Data
LEARNING OBJECTIVES
The concept of univariate statistical analysis covers statistical techniques for testing a data set with one variable.
However, most research projects need to measure several variables for each research unit or individual (sampling units
or experimental units) in one or more samples.
For example, consider assessing the water quality of a river based on monitoring certain parameters such as pH,
dissolved oxygen (DO), electrical conductivity (EC), turbidity, biological oxygen demand (BOD), chemical oxygen
Applied Statistics for Environmental Science with R 1 © 2020 Elsevier Inc. All rights reserved.
https://doi.org/10.1016/B978-0-12-818622-0.00001-0
2 1. MULTIVARIATE DATA
demand (COD), and total suspended solids (TSS), to make a decision about the pollution status of the river. In this case,
there are seven variables to be measured for each sample, which are generally highly correlated. If one variable at a
time is considered to analyze the results of multivariate data, the relationships between the variables would be ignored
and a different picture would be reflected regarding the true behavior of the chosen parameters (variables) in the pres-
ence of other parameters (variables). Thus, we should use a method that takes into account the correlation between
chosen variables to untie the overlapping input (information) by the correlated variables to understand the behavior of
the chosen variables properly.
Therefore, data sets with several variables can be analyzed, employing multivariate methods that consider the
relationship between the chosen variables. Multivariate methods are a collection of techniques that can serve several
purposes in the field of environmental science and engineering, which include cluster analysis for recognizing
groups of similar observation (e.g., individuals, objects); principal components analysis and factor analysis as data
reduction methods to reduce the number of variables to a smaller number of dimensions, called components
(factors), which are uncorrelated without losing valuable information; discriminant analysis, which is applied
to separating the data into various groups based on the measured variables; multivariate analysis of variance
(MANOVA), employed to perform statistical hypothesis testing based on multivariate data (several variables);
and multivariate multiple regression analysis, which is employed for making predictions based on the relation
among the variables.
We can organize multivariate data for various variables measured from a number of samples (items) in a table.
The number of samples (items) specify the number of rows and the number of variables specify the number of columns
of the table.
In general, Table 1.1 shows the configuration of n samples (representing the rows) and k variables (representing the
columns) measured for each sample.
Note: The total number of variables measured for each sample (k) is usually smaller than the total number of
samples, n.
We can illustrate the concept of multivariate data by providing some real examples of multivariate data with regard
to environmental science that may be useful in more easily explaining the associated computations. Furthermore, the
interpretation of these examples will be given, including the scenario of each case.
Variable
Sample Y1 Y2 … Yj … Yk
TABLE 1.2 Physiochemical Parameters for the Juru and Jejawi Rivers
Location Temperature PH DO BOD COD TSS EC Turbidity Total phosphate Total nitrate
Juru 28.15 7.88 6.73 10.56 1248.00 473.33 42.45 13.05 0.88 12.45
Juru 28.20 7.92 6.64 10.06 992.50 461.67 42.75 14.11 0.55 15.05
Juru 28.35 7.41 5.93 6.01 1265.00 393.34 29.47 25.95 0.88 15.80
Juru 28.40 7.84 6.23 14.57 1124.00 473.34 42.35 22.00 1.26 12.45
Juru 28.30 7.86 6.29 7.36 1029.50 528.34 40.70 12.36 0.61 12.90
Juru 28.30 7.89 6.36 5.27 775.00 458.33 42.50 17.90 1.01 19.25
Juru 28.55 7.40 5.67 4.81 551.00 356.67 29.51 27.70 1.02 25.55
Juru 28.75 7.41 5.26 4.96 606.00 603.33 28.45 29.35 0.84 17.90
Juru 29.30 7.25 4.18 6.61 730.00 430.00 26.25 35.35 0.73 12.85
Juru 29.55 7.18 5.14 5.11 417.00 445.00 25.79 27.75 0.72 17.15
Jejawi 27.80 7.40 4.96 36.65 82.35 815.00 29.55 1.01 1074.00 10.51
Jejawi 27.70 7.53 5.86 37.20 122.90 858.34 21.80 0.77 947.50 6.31
Jejawi 27.85 7.45 5.52 41.50 46.35 823.34 27.55 1.12 1194.50 7.36
Jejawi 27.85 7.41 5.27 36.75 52.65 821.67 16.45 0.46 988.50 5.87
Jejawi 27.65 7.27 5.71 37.95 99.95 736.67 21.10 1.01 653.00 16.36
Jejawi 27.85 7.36 5.16 34.85 36.30 373.34 11.50 0.73 911.00 11.26
Jejawi 27.85 7.37 5.08 37.05 33.50 385.00 13.35 0.71 632.00 3.75
Jejawi 27.75 7.40 5.23 37.10 36.35 403.34 10.85 0.57 787.00 3.31
Jejawi 27.70 7.38 5.30 38.30 30.00 550.00 13.90 0.75 879.50 3.15
Jejawi 27.65 7.36 5.17 38.40 26.15 363.34 12.65 0.62 1339.00 8.56
In this example, 20 samples (rows) are collected from the Juru and Jejawi Rivers (10 different sites each), and 10 physiochem-
ical parameters (columns) are measured from each sample. It is very difficult for the researchers to make a decision about the
pollution status of the rivers or about the behavior of the different parameters based on the chosen parameters under study
because there are many different values, and the differences (fluctuations) between the values of the parameters from sample
to sample will mislead the researcher, making it difficult to make a correct decision. Thus, these data should be analyzed
employing an appropriate multivariate method that meets the objective of the study to untangle the information and to under-
stand the body of the phenomenon. The objectives of this research were to determine the range of similarity among the sampling
sites, to recognize the variables responsible for spatial differences in river water quality, to determine the unobserved factors
while demonstrating the framework of the database, and to quantify the effect of possible natural and anthropogenic sources of
the chosen water parameters of the rivers.
4 1. MULTIVARIATE DATA
TABLE 1.3 The Results of Physiochemical and Heavy Metals Obtained From Three Ponds of Landfill Leachate
Parameter
Pond pH EC TDS TSS COD BOD NH3H DO Mg Ca
Collection 7.19 6242.00 4530.00 43.26 944.67 9.57 81.66 6.90 12.58 14.28
Collection 7.36 6062.00 4392.00 29.45 845.33 8.77 61.33 6.69 17.91 13.17
Collection 7.41 6164.00 4476.00 31.52 897.00 9.21 71.00 6.48 14.44 15.54
Collection 7.92 6710.33 4710.00 39.36 810.00 8.66 52.43 6.85 19.99 88.25
Collection 7.84 6690.00 4709.83 36.46 686.67 9.66 55.80 6.85 20.50 81.78
Collection 7.83 6600.33 4654.83 37.00 1365.00 10.33 52.80 7.23 21.18 85.18
Collection 8.46 6893.00 4849.00 47.33 692.00 8.47 110.67 2.34 27.23 24.94
Collection 8.47 6850.00 4860.00 50.67 967.00 9.27 93.00 2.82 19.01 14.38
Collection 8.48 6873.00 4859.00 49.00 833.00 8.87 103.00 2.67 24.66 19.99
Aeration 8.98 2501.00 1792.00 101.67 502.67 76.33 3.27 7.88 10.98 17.04
Aeration 8.97 2438.00 1750.00 89.00 543.67 93.67 8.90 7.62 10.99 9.84
Aeration 9.07 2422.00 1762.00 92.00 526.67 71.00 10.80 8.31 10.98 15.00
Aeration 8.21 2255.00 1594.00 148.83 566.67 4.66 4.30 7.23 18.51 54.30
Aeration 8.38 2092.83 1577.83 151.00 618.33 15.33 3.80 7.37 20.14 54.08
Aeration 8.32 2240.67 1578.83 162.67 573.33 10.00 4.40 7.55 21.77 55.30
Aeration 9.97 1513.00 1065.00 51.00 339.00 9.10 20.08 8.08 7.96 5.25
Aeration 10.18 1497.00 1049.00 42.00 322.00 9.32 23.27 8.01 6.85 4.44
Aeration 10.21 1532.00 1084.00 38.00 324.00 8.95 23.26 8.12 8.53 6.78
Stabilization 8.67 1491.00 1087.00 115.00 521.33 17.33 1.37 6.95 20.82 19.99
Stabilization 8.60 1521.00 1104.00 105.33 504.00 29.67 7.82 6.42 20.50 23.10
Stabilization 8.58 1545.00 1123.00 108.33 532.00 47.33 6.00 5.71 20.48 27.13
Stabilization 8.82 1321.83 932.67 98.00 253.67 13.66 6.60 7.55 11.92 66.96
Stabilization 8.92 1352.88 965.00 89.39 520.00 15.00 6.20 7.23 12.46 65.28
Stabilization 8.93 1314.83 942.83 101.61 395.00 15.00 6.20 7.23 12.34 62.13
Stabilization 9.09 652.00 462.00 57.00 248.00 4.98 2.60 6.74 4.47 1.96
Stabilization 9.19 669.00 476.00 51.00 257.00 5.61 2.59 6.71 4.11 2.31
Stabilization 9.43 680.00 480.00 47.00 273.00 4.92 1.93 6.66 3.87 2.30
1.4 EXAMPLES OF MULTIVARIATE DATA 5
TABLE 1.3 The Results of Physiochemical and Heavy Metals Obtained From Three Ponds of Landfill Leachate—cont’d
Parameter
Pond Na Fe Zn Cu Cr Cd Pb As Co Mn
Collection 978.37 0.35 14.46 9.80 31.50 0.31 2.71 32.28 6.34 5.44
Collection 978.37 0.23 17.89 9.03 22.37 0.32 3.35 20.55 5.14 9.55
Collection 978.37 0.62 11.50 9.59 10.70 0.01 2.16 9.75 3.95 16.60
Collection 284.44 0.56 23.80 57.10 18.12 0.85 8.03 11.52 6.26 52.72
Collection 273.47 0.56 24.00 57.40 17.04 1.73 7.60 14.41 7.43 51.82
Collection 283.80 0.56 23.10 56.80 17.58 1.71 7.04 10.09 6.61 50.92
Collection 673.45 1.34 61.01 64.92 42.36 0.04 0.70 25.21 21.30 109.34
Collection 633.35 1.98 49.15 53.45 59.06 0.56 3.91 14.81 18.78 133.59
Collection 665.61 0.79 27.48 58.75 52.27 0.06 2.08 17.07 23.67 122.59
Aeration 980.00 0.25 151.10 0.43 47.82 0.01 2.82 10.72 6.74 22.94
Aeration 975.55 0.73 259.08 2.83 3.67 0.39 2.72 11.12 2.61 32.44
Aeration 970.55 1.21 203.88 5.23 91.97 0.08 2.62 10.92 10.81 41.94
Aeration 525.57 0.87 425.46 69.42 86.18 0.56 7.17 21.66 16.18 4.97
Aeration 516.21 0.93 424.43 66.68 89.81 0.35 7.57 21.36 15.99 5.32
Aeration 517.13 0.85 423.45 63.63 85.56 0.16 7.37 20.68 15.78 5.96
Aeration 676.98 0.77 28.05 13.66 27.88 0.32 2.58 6.73 5.73 8.52
Aeration 630.31 0.94 18.69 15.97 37.11 0.08 4.90 10.12 5.61 10.34
Aeration 665.12 0.57 34.44 16.75 30.65 0.11 3.86 7.66 6.75 9.52
Stabilization 975.37 1.33 117.47 3.90 55.15 0.30 3.80 22.23 15.01 11.50
Stabilization 978.37 0.81 278.47 5.21 70.29 0.01 4.12 18.10 20.84 21.61
Stabilization 973.37 0.29 175.47 6.51 54.57 0.00 2.90 12.15 17.93 30.51
Stabilization 411.60 1.37 229.90 23.93 48.47 0.28 6.45 16.55 5.19 6.44
Stabilization 439.04 1.43 227.39 25.23 49.26 0.61 5.25 15.75 4.75 6.02
Stabilization 494.26 1.23 221.89 25.03 48.83 0.01 5.89 17.35 4.62 6.82
Stabilization 141.00 0.33 149.11 6.42 12.88 0.02 2.48 5.71 1.28 1.63
Stabilization 103.81 0.87 240.77 7.91 8.76 0.14 4.63 3.32 1.54 1.93
Stabilization 81.20 0.51 190.68 6.64 11.00 0.04 3.78 2.37 1.62 1.66
The leachate samples were collected from collection, aeration, and stabilized ponds in the ATLS leachate collection system.
The leachate samples were collected three times during the period between August 2017 and January 2018, with three sampling
points at each pond. The samples were manually gathered and placed in 500 ml polyethene containers. The samples were
immediately transported to the laboratory and cooled to 4°C to reduce biological and chemical reactions (Japan International
Cooperation Agency (JICA)).
In this example, 27 samples (rows) were collected from the three ponds (collection, aeration, and stabilization), and
20 parameters (columns) were measured from each sample. It is not easy for the scientist to make a decision about the
6 1. MULTIVARIATE DATA
treatment process of the landfill or about the behavior of the different parameters under study because there are many dif-
ferent values (27 samples 20 parameters ¼ 540 values). Thus, the relationship among the different chosen parameters
should be investigated and studied properly to understand the differences (fluctuations) in the parameters from one pond
to another. These data should be analyzed employing an appropriate technique to achieve the objective of the project. The
first objective was to assess whether the treatment process of the landfill leachate worked properly, and the relationship
among the chosen parameters should be investigated for more information on the behavior of each variable in the presence
of other chosen variables, which would help to identify the source of the variation. The second objective was to assess the
effect of the landfill on the groundwater and surface water in the chosen area (the data for groundwater and surface water are
not presented to save space). Furthermore, the contribution of each chosen variable (parameter) in illustrating the total var-
iation in the collected data was identified employing multivariate methods. This research may help in estimating the impact
of the landfill on groundwater and surface water in the chosen area.
TABLE 1.4 The Results of Inorganic Elements in Particulate Matter in the Air (μg/m3)
Season PM10 Al Zn Fe Cu Ca Na Mn Ni Cd
Summer 37.71 0.01 2.32 0.97 0.00 1.37 7.94 0.05 0.01 0.01
Summer 48.03 0.01 2.05 0.75 0.00 1.56 8.46 0.05 0.01 0.04
Summer 67.87 0.01 3.43 0.44 0.00 1.41 9.98 0.06 0.10 0.05
Summer 39.01 0.01 4.24 0.45 0.00 1.12 12.93 0.01 0.07 0.00
Summer 38.33 0.01 3.51 0.50 0.02 1.60 9.92 0.02 0.00 0.05
Summer 29.70 0.01 2.78 0.58 0.02 1.38 12.13 0.04 0.06 0.02
Summer 53.66 0.01 2.40 0.73 0.02 1.59 7.81 0.23 0.11 0.02
Summer 132.28 0.01 2.34 0.55 0.02 1.59 9.01 0.03 0.00 0.01
Summer 66.31 0.01 2.13 0.45 0.02 1.23 8.76 0.02 0.00 0.05
Summer 69.20 0.01 2.13 0.50 0.03 1.95 8.66 0.03 0.08 0.05
Summer 78.17 0.01 1.96 0.56 0.04 1.35 8.88 0.03 0.02 0.07
Summer 31.63 0.01 2.21 0.42 0.04 1.24 9.02 0.02 0.08 0.05
Summer 66.73 0.01 1.46 0.51 0.03 1.33 8.53 0.02 0.13 0.03
Summer 113.56 0.01 2.07 0.60 0.03 1.51 7.65 0.02 0.06 0.05
Summer 123.40 0.01 1.75 0.57 0.03 1.75 7.45 0.02 0.12 0.04
Summer 72.39 0.01 1.61 0.57 0.03 1.54 8.40 0.02 0.07 0.02
Summer 51.85 0.01 1.16 0.51 0.18 1.72 7.44 0.01 0.09 0.00
Summer 77.59 0.01 1.66 0.81 0.04 2.04 7.89 0.03 0.08 0.03
Summer 30.30 0.01 1.09 0.61 0.03 1.80 6.28 0.02 0.05 0.03
Summer 100.40 0.01 1.82 0.47 0.04 1.70 7.05 0.02 0.05 0.02
Summer 132.98 0.01 1.86 0.52 0.03 1.74 7.69 0.01 0.10 0.03
Summer 126.38 0.01 0.34 0.02 0.00 0.01 1.36 0.01 0.18 0.00
1.4 EXAMPLES OF MULTIVARIATE DATA 7
TABLE 1.4 The Results of Inorganic Elements in Particulate Matter in the Air (μg/m3)—cont’d
Season PM10 Al Zn Fe Cu Ca Na Mn Ni Cd
Summer 31.82 0.01 1.41 0.52 0.03 1.93 7.04 0.02 0.04 0.00
Summer 110.53 0.01 1.25 0.67 0.04 2.29 7.39 0.02 0.03 0.00
Summer 38.41 0.01 1.55 0.79 0.12 2.21 7.74 0.02 0.06 0.03
Summer 124.26 0.01 2.24 0.00 0.00 1.56 9.65 0.07 0.06 0.01
Summer 53.62 0.01 1.97 0.01 0.00 1.39 7.07 0.03 0.04 0.03
Summer 23.30 0.01 1.28 0.02 0.00 1.80 6.39 0.03 0.00 0.00
Summer 67.34 0.01 0.91 0.02 0.00 0.80 7.14 0.00 0.08 0.00
Summer 22.58 0.03 1.29 0.02 0.01 0.92 7.21 0.02 0.05 0.02
Summer 54.52 0.03 1.37 0.04 0.01 1.48 7.98 0.01 0.03 0.03
Summer 112.83 0.01 1.88 0.03 0.00 1.46 8.44 0.01 0.02 0.00
Summer 175.28 0.01 1.68 0.03 0.00 1.38 8.13 0.00 0.03 0.00
Summer 47.30 0.01 2.20 0.03 0.00 1.13 11.39 0.01 0.03 0.01
Summer 57.08 0.01 1.69 0.04 0.00 1.04 8.34 0.02 0.07 0.00
Summer 15.16 0.01 1.58 0.04 0.01 1.77 6.69 0.05 0.08 0.02
Summer 272.98 0.01 1.88 0.04 0.01 1.60 7.05 0.04 0.08 0.00
Summer 101.82 0.01 1.86 0.02 0.01 1.79 6.02 0.05 0.13 0.00
Summer 59.90 0.01 1.73 0.02 0.01 1.70 7.25 0.05 0.00 0.02
Summer 31.07 0.01 1.92 0.02 0.02 1.40 4.92 0.05 0.03 0.04
Summer 107.43 0.01 2.03 0.02 0.02 1.48 5.19 0.04 0.06 0.00
Summer 30.51 0.01 2.42 0.03 0.03 2.08 5.70 0.07 0.00 0.04
Summer 84.02 0.01 2.10 0.22 0.01 1.61 5.85 0.03 0.00 0.00
Summer 7.37 0.01 2.63 0.35 0.02 1.24 3.90 0.03 0.00 0.00
Summer 7.65 0.01 1.38 0.34 0.01 1.11 4.20 0.04 0.00 0.00
Summer 15.05 0.01 1.56 0.21 0.02 0.35 2.47 0.03 0.00 0.00
Summer 84.13 0.01 1.89 0.04 0.02 0.41 4.44 0.02 0.00 0.02
Winter 82.87 0.01 1.66 0.15 0.02 0.64 4.79 0.02 0.06 0.00
Winter 138.25 0.01 2.53 0.30 0.03 0.42 3.02 0.02 0.00 0.03
Winter 82.67 0.01 2.66 0.31 0.04 0.03 3.16 0.03 0.00 0.00
Winter 46.08 0.01 3.39 0.52 0.02 0.21 2.34 0.03 0.00 0.00
Winter 15.33 0.01 2.63 0.23 0.02 0.06 4.38 0.03 0.05 0.00
Winter 30.78 0.01 1.98 0.17 0.01 0.40 3.68 0.01 0.00 0.00
Winter 155.00 0.01 3.72 0.15 0.02 1.21 4.24 0.02 0.00 0.00
Winter 61.75 0.01 2.93 0.00 0.03 0.92 4.29 0.01 0.00 0.00
Winter 88.90 0.01 4.91 0.11 0.02 0.93 6.65 0.02 0.00 0.00
Winter 38.47 0.01 3.46 0.15 0.02 0.73 4.25 0.01 0.01 0.00
Winter 24.46 0.01 1.97 0.19 0.01 0.75 1.96 0.01 0.00 0.00
Winter 14.29 0.01 2.76 0.11 0.01 0.67 9.55 0.03 0.00 0.00
Winter 65.52 0.01 2.56 0.32 0.01 0.54 2.70 0.04 0.00 0.00
Winter 63.33 0.01 1.21 0.16 0.02 0.53 3.40 0.04 0.03 0.00
Winter 62.28 0.01 3.52 0.43 0.01 0.80 7.78 0.02 0.11 0.00
Continued
8 1. MULTIVARIATE DATA
TABLE 1.4 The Results of Inorganic Elements in Particulate Matter in the Air (μg/m3)—cont’d
Season PM10 Al Zn Fe Cu Ca Na Mn Ni Cd
Winter 65.87 0.01 3.48 0.25 0.02 0.99 11.06 0.01 0.03 0.00
Winter 122.83 0.01 3.24 0.41 0.03 1.09 11.47 0.01 0.06 0.00
Winter 50.18 0.00 2.40 0.49 0.02 1.16 7.90 0.02 0.11 0.00
Winter 60.19 0.01 3.02 0.38 0.02 0.84 9.51 0.02 0.06 0.01
Winter 28.28 0.01 3.68 0.48 0.06 1.18 12.21 0.01 0.09 0.00
Winter 42.50 0.01 1.86 0.46 0.06 1.34 10.51 0.03 0.07 0.00
Winter 49.68 0.01 2.29 0.32 0.04 1.57 11.02 0.02 0.00 0.04
Winter 39.91 0.01 3.30 0.45 0.05 1.46 8.61 0.03 0.08 0.02
Winter 37.94 0.01 3.57 0.31 0.08 1.59 9.47 0.04 0.00 0.05
Winter 27.97 0.01 3.14 0.09 0.06 1.46 12.08 0.03 0.00 0.05
Winter 45.34 0.01 2.59 0.26 0.06 1.59 11.43 0.03 0.06 0.00
Winter 58.45 0.01 3.46 0.12 0.06 1.66 8.98 0.04 0.10 0.00
Winter 43.30 0.01 1.58 0.09 0.05 1.46 7.59 0.03 0.00 0.00
Table 1.4 provides a huge and complex data set. It can easily be observed that the table does not provide helpful data (infor-
mation) for making a decision. The objective of the study was to assess the air quality of Penang, Malaysia, in terms of PM10 and
inorganic elements, and to recognize the main sources of PM10 and inorganic elements, whether crustal or noncrustal. The goals
included investigating the relation between the different chosen inorganic elements and PM10 during the summer and winter
monsoons and determining the similarities between the chosen parameters.
TABLE 1.5 The Concentration of Heavy Metals in Sediment (mg/L) for Juru and Jejawi Rivers
Cu Zn Cd Cr Fe Pb Hg Mn
In this study, the researcher wanted to investigate and understand the interrelationship between the chosen parameters and
to extract information about the resemblance or differences between the different sampling sites, identification of the variables
(heavy metals) accountable for the spatial differences in river estuaries, and the effect of the possible sources (natural and
anthropogenic) on the chosen heavy metals of the two river estuaries.
The normal distribution is the most important distribution in statistics. The normal distribution is very important
because most of the tests used in statistics require that the assumption of normality be met; the data are gathered from a
normally distributed population.
A brief explanation of the univariate and multivariate normal distributions is given below.
where
μ is the mean; and
σ 2 is the variance.
The univariate normal distribution can be written as Y N(μ, σ 2).
The normal distribution (bell-shaped) curve is presented in Fig. 1.1. R statistical software was used to generate
the curve. The commands and built-in functions for creating a normal distribution curve are presented in the
Appendix.
10 1. MULTIVARIATE DATA
where
k is the number of variables;
μ
Pis the mean vector;
is the
Pcovariance matrix; and
ðY μÞ 1 ðY μÞ is the Mahalanobis distance (statistical distance).
P
The multivariate normal distribution can be denoted as Y Nk( μ, ).
Note:
• A suitable transformation of the data should be used if the normality assumption is violated by one or more
variables under investigation, as when the data are highly skewed with several outlier (extreme) values (high or
low) or repeated values.
• If all the individual variables follow a normal distribution, then it is supposed that the combined (joint) distribution
is a multivariate normal distribution.
• In practice, the real data never follow a multivariate normal distribution completely; however, the normal density
can be employed as an approximation of the true population distribution.
Further Reading
Alkarkhi, A.F.M., Alqaraghuli, W.A.A., 2019. Easy Statistics for Food Science with R, first ed. Academic Press.
Alkarkhi, A.F.M., Ismail, N., Ahmed, A., Easa, A.M., 2009. Analysis of heavy metal concentrations in sediments of selected estuaries of Malaysia—a
statistical assessment. Environ Monit Assess 153, 179–185.
Banch, T.J.H., Hanafiah, M.M., Alkarkhi, A.F.M., Amr, S.S.A., 2018. Statistical evaluation of landfill leachate system and its impact on groundwater
and surface water in Malaysia.
Blogger, 2011. R graph gallery: A collection [Online]. Available, http://rgraphgallery.blogspot.my/2013/04/shaded-normal-curve.html.
Bryan, F.J.M., 1991. Multivariate Statistical Methods: A Primer. Chapman & Hall, Great Britain.
Daniel & Hocking, 2013. Blog Archives, High Resolution Figures in R [Online]. R-bloggers. Available, https://www.r-bloggers.com/author/
daniel-hocking/.
Johnson, R.A.W., W, D., 2002. Applied Multivariate Statistical Analysis. Prentice Hall, New Jersey.
Rencher, A.C., 2002. Methods of Multivariate Analysis. J. Wiley, New York.
Yusup, Y., Alkarkhi, A.F.M., 2011. Cluster analysis of inorganic elements in particulate matter in the air environment of an equatorial urban coastal
location. Chemistry and Ecology 27, 273–286.
C H A P T E R
2
R Statistical Software
LEARNING OBJECTIVES
2.1 INTRODUCTION
R is a software environment for statistical computing and graphical programming languages. The R software has
been employed by professionals at various organizations, colleges, survey institutions, and others. Considerable
statistical packages are provided for various statistical analyses; however, researchers, scientists, and others who
are concerned in data analysis, designing, modeling, and producing beautiful and high-resolution plots prioritize
employing R. R is offered for free as an open-source software; under the terms of the GNU General Public License,
“R is an official part of the Free Software Foundation’s GNU project, and the R Foundation has similar objectives
to other open-source software foundations like the Apache Foundation or the GNOME Foundation.” R language is
similar to the S language and environment that was developed by Bell Laboratories. R is employed by millions of
researchers around the world, and the number of R users continues to increase. R has become a substantial, engaging,
singular, and new statistical software for the following purposes:
• R is free (open-source) software including many packages, and many sources around the world permit
downloading and installation of the software, regardless of the position and the institution you work with, or
whether you are affiliated with a public or private organization.
• R offers many built-in functions to help make the steps of the analysis simple and easy. We can carry out data
analysis in R by providing scripts to know the required variables and asking built-in functions in R to carry out the
required process, such as computing the correlation, average, variance, or other statistical values.
• Clear, high-resolution, and unique graphs can be produced by R that meet specific standards or reflect a particular
opinion of the task (convey thoughts to the plot).
• R can easily be used by researchers without programming proficiency.
• People can download and install R for various operating systems such as Windows, Linux, and MacOS.
Applied Statistics for Environmental Science with R 11 © 2020 Elsevier Inc. All rights reserved.
https://doi.org/10.1016/B978-0-12-818622-0.00002-2
12 2. R STATISTICAL SOFTWARE
• Many statistical and graphical packages are provided by R library for data analysis, and different computations and
graphical applications generate high-quality plots.
• It is easy to interact with online community around the world, interchange thoughts, and receive assistance.
• R codes, commands, and functions are available online for free; moreover, considrable sources offer demonstrations
regarding R for free, containing courses, material, and responses to inquiries, which other software packages do
not offer.
• More facilities are offered by RStudio to operate R, and it is simpler and friendlier to employ than R.
The concepts and related terms to the R statistical package are addressed to provide a starting point for readers who
are novice to the R language and environment. Beginners will be guided on how to download and install the software
(R and RStudio), comprehend some ideas and related concepts employed in R, and write simple and easy scripts in R.
We attempted as much as we could to make the procedure and directives simple and comprehensible to everybody.
Considerable examples are provided to lead the researchers step-by-step and make the procedure interesting.
We can download and install R statistical software and its packages simply in a few steps, and the required pack-
ages related with R software are then installed. R provides considerable packages to carry out various statistical
methods. After installing R software, we can download and install RStudio that can be employed to operate
R effectively and in a more friendly way than R.
2.2 INSTALLING R
Consider we have not installed R statistical software yet. The software can be installed and downloaded for free by
employing the six steps below.
1. The reader can type https://cran.r-project.org/ and then click “Enter”; “the Comprehensive R Archive Network”
will appear as shown in Fig. 2.1.
2. The Screen for “The Comprehensive R Archive Network” offers three choices to download R software based on the
operating system of the computer, as presented below:
(1) Download R for Linux
(2) Download R for (Mac) OSX
(3) Download R for Windows
If we choose R for Windows, click install R for the first time (or base) as presented in Fig. 2.2.
FIG. 2.3 The Screen to download R-3.5.1 for Windows (32/64 bit).
3. We can click on the available version. The newest available version for R software is R-3.5.1, as shows on the Screen,
or there may be other versions. Click on the “Download R-3.5.1 for Windows (62 megabytes, 32/64 bit)” as shown in
Fig. 2.3.
4. Click on the “Download R-3.5.1 for Windows (62 megabytes, 32/64 bit)”; in the lower bottom-left corner, there is a
message “R-3.5.1-win.exe” (Fig. 2.4), this indicates that the file starts downloading to the computer.
5. Click on download file once the download of R is finished to open another screen, and then click on “run” as shown
in Fig. 2.5. The next step is to follow the guidance given to finish the installation.
6. R is installed on the computer and ready to be used. We can start using R by double clicking on the R icon.
2.2.1 R Material
It is highly useful to have a handbook or manuals to direct readers employing the new software, particularly for
novices who are new to the R software. This service is provided for free; we can use on-line websites to download
manuals and notes such as https://cran.r-project.org/, or other authorized R sources. The handbooks, manuals
and notes can be obtained offline by employing the help button in the upper row of the R environment, as presented
in Fig. 2.6; the help button provides many options, one of which is Manuals (PDF).
14 2. R STATISTICAL SOFTWARE
We have found that the handbooks and associated notes are helpful and offer clear direction, particullary for novice
readers. Some documents have been produced in various languages, such as Russian, Chinese, and German.
2.2.2 R Packages
R offers various statistical packages, and some packages are built-in packages (standard/base packages, loaded
packages once R installation is finished). Users can download other packages from the upper row of the R Console
(“Packages”). The packages can be downloaded by clicking “Install package(s),” and then select the site you wish
to download. A list of Packages are provided in R software, thus you can select the package you require to install.
The search () function is employed to display some of the loaded packages in your computer when R starts.
Another random document with
no related content on Scribd:
Vielä viinatkin lisäksi.
Kuin oli sarikat saanut,
Ryypyt suuhunsa suloiset,
Mies heti meni kadulle,
Sieltä poikkes porvariinsa.
Henric Ackrenius.
1.
3.
4.
5.
6.
Ei voi miehuus tätä välttää, koska jälttää
Luonnon laki landeitans,
Sankarinkin sapso laulaa, tuli taula,
Koska palaa povesans.
7.
8.
9.
10.
11.
12.
Neion valitus.
Muukalainen murehille.
Istuin ilona aholla,
Mehu-miellä mättähallä,
Istun kukkana keolla,
Lempeästi leikitellen,
Suloisten sisarten kanssa,
Tyvenesti tuuvitettu.
Tuulen hengeltä tulovan,
Mesisestä mantereesta.
Updated editions will replace the previous one—the old editions will
be renamed.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the terms
of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.
• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”