Prediction of Crop Yield Using Regression

You might also like

Download as pdf
Download as pdf
You are on page 1of 7
Intemational Jounal of Soft Computing 12 (2): 96-102, 2017 ISSN: 1816-9503, © Medwell Journals, 2017 Prediction of Crop Yield Using Regression Techniques Aditya Shastry, HA. Sanjay and E. Bhanusree Nitte Meenakshi Institute of Technology, Bangalore, India Abstract: With the emergence of artificial intelligence ‘and computer science, data mining has received an enormous amount of boost, Recently, data mining algorithms have been successfully used in the field of agriculture for predicting the yield of crops. Crop yield prediction involves predicting the yield of crops from available historic data like weather parameters, soil parameters and histerie crop yield. Regression is a data ‘mining function that predicts a number. Regression techniques are very useful in predicting the yield of crops. In this study, the focus is on the development of regression techniques in agricultural field. Different regression techniques such as quadratic, pure-quadratic, interactions and polynomial are used for predicting the yields cf wheat, maize and cotton crops. Finally regression models are propoted which are able to accurately predict the yields of cotton, maize and wheat. The best regression model is selected based on Root Mean Squared Error (RMSE, R! and Mean Percentage Prelcton Env (PPE) valves Key words: Regression, INTRODUCTION Data mining (the investigation venture of the “Leaming Discovery in Databases” procedure), a field at the intersection of software engineering and measurements is the procedure that endeavors to find lesigns in extensive information sets. Data mining covers concepts of databases, machine learning, statisties and atfcial intelligence. The main objective of data mining is to find useful pattems from existing data. The genuine data mining aasigment isthe semiautomatic or automatic analysis of lange amounts of information to extract already unknown fascinating examples, hike unusual records, conditions and gatherings of infermation records which help in providing accurate prediction results using predictive modeling From the past few years predictive modeling is practiced in only few areas in this competitive world, The big data phenomenon is a tool which is used for data analysis in new applications in order to increase the adoption of predictive models. Predictive models assist decision makers in forming right dacisions by making thom efficient and effective, The automation of entire decision making process is also possible in some eases Agricultural system is very complex, sines it deals with large data situation which comes from a mimnber of factors Crop yield prediction is significant for farmers and agriculture related industries In this research, regression models for predicting the yield of erops like cotton, wheat and maize depending on jeld, parameters, model, accurac soil weather and crop parameters are proposed. Comparison is made between the different regression models based on RMSE, R’ and MPPE metrics, Literature review: In this shudy, research works related to the Data mining techniques for crop yield prediction isdiscussed Zaefizadeh et al (2011), 40 genotypes were planted in Ardabil, Multiple Linear Regression (MLR) and Artificial Neural Networks (ANN) were employed for prediction of grain yield, In ANN 15 neurons. with one hidden layer was utilized, The activation function and Teaming method used were hyperbolic tangent funetion and error propagation respectively Experimental results established that MLR outperformed ANN, Sanchez ef al, (2014), researchers show a comelation among a few strategies (linear and nonlinear) for prediction of crop yield, The comparison is made utilizing the best property subset found in the preparation dataset for every strategy which was distinguished utilizing the percentage split validation and a complete algorithm. To search the optimal atwibute subset in training datasets the algorithm uses the oldest samples to build the models. The text datasets is composed of lunseen samples where the performance is measured. The most widely recognized information driven method for prediction of erop yield like stepwise linear regression, multiple linear regression, regression trees and neural systems were assessed. The experimentation demonstrates that our quality determination utilizing & Corresponding Author: Aditya Skastry, Nite Meenakshi Institute of Technology, Bangalore, Inds 96 Int, 5. Soft Comput, 12 2): 96-102, 2017 complete system generously enhances the execution of all, the assessed strategies, ANN and MS acquired the best forecast and, betvieen them, ANN accomplished the lower RRSE, the higher R relationship and the lower RMAB. value. By and by, none of the strategies had the capacity toobtain the optimum subset with the training data for all the eight yields. The best method was ANN which accomplislied three attribute subsets equal to the optimal and the other two subsets were very close to it, Thus, an attribute subset that can be used permanently in all the years for all the crops is difficult to select, Results obtained from machine-leaming methods cannot be straightforwardly connected to an alternate arrangement fof harvest databases, due their high dependency of information. The strategy presented in this paper ean be extended for a larger number of techniques and crop datasets. A future research Focused on finding the best minimal subset of attributes which provides « good yield of predictions on other irrigation zones should be considered, Zhang, et a. (2010), researchers consider the linear regression model taking into account that the Ordinary Least Square (OLS) estimation is a generally uilized strategy for prediction of erop yield. Here, autoregressive model performed better than OLS with higher R*, The research coneluled that NDVI and precipitation contributed more ta the com yield in Iowa, excluding temperature. Zaw andl Naing, (2009), the researchers consider the Polynomial Regression Model (MPR) in order to predict the rainfall in the region of Myanmar, The authors have created prediction forecast model in view of 15 predictors utilizing second-arrange MPR, As a consequence of a few examinations, the anticipated precipitation sum is near to the genuine qualities. SMR expectation model was created With four indicators, The model results were used in the temitories, for example in harvest planning and yield prediction, water administration and repesitory contol ‘The fundamental point of the improvement of the forecast model is to help in water management and farm management, All possible subsets of predictors have been examined in the implementation of multiple polynomial regression which utilizes only 2005 test data Authors demonsizate that MPR performs better than MLR, (Qaddoum and Hines (2012), researchers perform an expansion to the conventional regression neural networks. Conformal Prediction (CP) frameworks built for predicting the tomato yield in a greenhouse taking into consideration Vapor Pressure Deficit (VPD}, CO; radiation, tnd temperature, About 60,000 records were used in this process, Ramasubramanian researchers discuss the different forecasting approaches in agriculture including regression models, time series models and probabilistic models. In regression model there ate three models that are multiple linear regression models for forecasting crop yields; Weather indices based multiple linear regression model, for crop pest count and logistie regression model. for forewarning qualitative response variables, Ia time series model two models used are exponential smoothing model for production of crops and auto regressive integrated moving average models. In probabilistic model, Markov Chain Model is utilized for erop yield forecasting MATERIALS AND METHODS. Experimental data with parameters: Data sources along, with the input parameters considered for the prediction of wheat, maize and cotton yields is elaborated in this section. To perform the analysis, it is important to gather the datasets from various sources. The information has to be ina specific form which has to be altered again based upon the attributes. In order to give the input to the regression model following data sets have been considered. Datasets gathered is for three harvests, wheat, cotton and maize. Entire datasets comprising of number of records with different attributes which influences the produet yield are presented. Maize data set was collected by Carberry and Abrecht (1991) having 78 records. Wheat data set was collected by Asseng et af. (1998) with a total of ‘50 reconds. Cotton Data Set was collected by Heam (1994) having 123 reoords. The aatributes play a very important role in crop yield prediction but not all attributes give proper predicted values. Attributes which influence the harvest yield the most are chosen. The attribute selection assumes essential part because relying upon the chosen attributes the yield of the crop can be predicted more accurately. There are only some specifi attributes which help to process the yield output ‘Table 1 lists the attributes that affect the yield of the respeetive crops the most Regression model design for crop yield prediction: Inthis| study, the genetal design followed for developing the regression models for crop yield prediction is discussed Regression analysis is majorly used for prediction purposes as it provides predicted entity as a function of the dependent entities. In certain cases, it gives the relationships between independent and dependent variables (Alan, 1993) The steps to develop a regression. model for prediction of crop yield. Int, 5. Soft Comput, 12 2): 96-102, 2017 ‘able: _Autte fo cop yield pedo — i or ‘inka for AR for chet ike fail cata Be dy water SW tal total dg Cote data) Biers ‘ei requrenre) Ger and pepe rem it Grain pen ancust Rell) ——-NUptae tags a efi Oy 0 Grins iz ofein Raton (oe ile so. ofetn Ras eae Cones gam) radiator) lpr sei) toa rons reant Is re Mi et en ¥ sw (euracbie ine) solvate) EW Tals peico mae wig and ait I So il ode onto cna Nevin ¥ tenpertre Nininum ngeratre Algori regression: Input Eperneital data st of weathar dt crop data and cl dit Output: Predised cep viele he opernental dan Mehr Step Gates, fomat a orgie the ingortion: nly raw ingration| ie inc! to werk with the rel The eration it be autered, sat ow ape the necesty ad organization fe such 2 pale tat udable rake ae bined Wile redoing. mare fp fomedion be nied Step 2: Seprteinfematon ito testing and trinina ets: the data information neds tm be parened it to ets. Trang se il have greatest rate the ifertion in ort ain ms ot carpe o make the yi Apgrcmaely 76 ofthe sar ee aye caleted inet tang set Testes et wes the remaining ‘henna ofthe ner to test ae ese preri, Step 3: Apply reeression on tained sets the model sytem elise Upon hw comple he ise aml the srctr ieee be chosen vith the need. While alterng. the constuction modeling 2nd trai cn be aie ‘Calcul he RMSE. tatstis and MPPE vues fe the dirt ‘ode Step & Apply the trained rereson mode on ts stand again aes the RMSE Rss and APE vais, Compare the Vale ‘vt difcet med eressen models, The med which vee (he lowest RAISE and MDPE ales and highest tai alee isons o be the best mole rep viel pret. steps for crop yield prediction using Figure 1 shows the flow chart for regression methodology used for predicting the crop yield Regression model for maize yield prediction: Here, the maize data set consists of a total of 78 records, Out of which 55 records are considered as training set and remaining 23 records as test set. Quadratic model, pure quadratic model, interactions model, linear model and polynomial models are developed for maize yield prediction A quadratic regression is the procedure of discovering the mathematical equation of the parabola that fits best for a set of information, The pure quadratie regression model mainly has square terms of each attributes, In interactions regression model the depenslent variable relies on the interactions between the independent variables. In linear repression the prediction ‘Compute, RMSE nd MPP formed cack RIE, W isbgtest Fig, 1: Regression methodology for erop yield prediction of dependent attribute is dene using single independent attributes and net collectively. In polynomial model, the independent attributes aro varied up to nthdegree polynomial to predict the outcome, Polynomial models are built by varying their degrees Table 2 shows the different regression models with the corresponding equations for maize yield prediction Here x, is esw, x is biomass, x is rainfall, x, i radiation and y is maize yield. Among all these models, the pure quadratic model was able to accurately predict the maize yield ‘The p-values for xx, were measured for each, the models. It was found that x, had the lowest p-value From this, it can be inferred that biomass (x,) is contributing more to the maize yield than the other attributes Regression model for wheat yield prediction: Here, the wheat data set consists of a total of 50 records, Out of which 35 records are considered as training set and remaining 15 records as test set. Quadratic model, pure quadratic model, interactions model, linear model, polynomial model and Generalized Linear Regression Model (GLM) are built, Table 3 shows the regression models with the corresponding equations for wheat yield prediction. Here x isbiomass, x, is Grain-Protein, x8 Grain-Siza xis ESW and y is wheat yield Int, 5. Soft Comput, 12 2): 96-102, 2017 “Teble2: Regen modes ize yd pein ‘Table: Resresion ode er extn yi predision Models _—_Eguiene Mosels Equations Quads y= 2M STIS OIE ON TT SRC RNONRT NG. Qiathale — y = GM TOA TREN OTT Tae Dap t-te 000% BOOGIE? TAG (03513166 1861689150 Te Sinatra acne kON HL TIN UrLxxcootlyxst00%e 40 RISA Pe quai y=-20 8160-0. 2002473 0.084 Gop HO THe AL ITH IBN BS IORIG Pontoon 2 tou? OITA PE TAS 1828.38 Ke TRA Ineraio— =-714.7. 294 3H 92A 0.00070 10SH4 Ix 06, 216A TAH GEL SIE “os 40D %-000086 F617 08x cr 1D In 159. 10 RRB 9 0865-0 OTB 016 S46 6A ZBER MR Linear BTS ORR GMAT RA 002 Pabamial y= 68240966 0326-24683 684-000021 e+ rngH28 276806. Ag -ODOIN LUDARL H oninse ote O08 NHS ash.i086" Pye y=-3472341 06x24 84-5793 301 3H Iuntsic Tg 98817 095-17 8hr42 2-0, D0OBHDG NOT “Table: Regression mode or wht yd eeiction Models Eqatione (Quads 9 = eT OS OOISOIN, ABT ODS ST BOL 008559580 BOLORER agar Sl -21 e222 REL SIN AAS Sais 0.003084) Pre quad y= 848803 7340247 “10168766 1087395 + 4.46 v0 000000 "3060375 015340 24-0607 Iueratio=-A4990 F104 2814615 1208868.7° $0088 x 0084556633 COOBDRIG* Linea y = 99403540. 4 OMSHS St. peljuoial = 895791204 85, 400469. 0, 82299 6 MIST 035-2898 286+ SBE. GT 20490 Fhe 0ONBHe DDIM REA ST, Generieod 85029411178 SIT S618, Linear 7.9253 -L0006b53. 93757 TOIEHGE SI Regression (GEM) Mat (GLM models are used when response variables have arbitrary istribution: (rather than simply normal dlistributions), Since, the response variable here is the \wheat yield which ean vary arbitrarily this model is more suited than the other regression models, After measuring the p-values it was found that biomass was contsibuting more tothe wheat yield, Also GLM performed better than the other models Regression model for cotton yield prediction: Here. the cotton data set consists of a total of 123 records. Out of which 84 records are considered as training set and remutining 39 records as test set. Quadratic mowel, pure quadratic model, interactions model, linear model, polyniomial model and Stepwise Linear Regression Model (SLM) are built Table 4 shows the regression models with the corresponding equations fer cotton yield prediction. Here xs dy total, x, is NUptake, x, is bolls_s0, is lai_max, x, is esw. x, is rainfall, x is radiation, x is ymaximum temperature, x, is minimum temperature andy isthe cotton. yield. In SLM, independent variables are included in the regression model step by step. The vatiables which contribute significantly o the outcome ate retained in the 99 BEL 86 1167-0005 0480 78 O8B 1960" learns y~ D724 8127 121.665 1380.28 41120 8169 S1e.t8r205 0954179002471. 156 000g x BGO 0OTEG AN OBDA 089 x9 +0042 00121186 0088 NA SEAL SER 2 SE OTN ATI, : HL cH49 25 MtA M20 BlaoeL age 15 eA at Su zhao" O1 690 Va Ie O10 xa 6 oA STH Tee $1541 ITH O25 =A AB0284 2 SHIH SI LOLDOTE L008 SoS Aly 303529. Polynomial y= 100032164 22-117 GON SAL AOG OE Lois aseogugri6? a6: 742s) Sho 8 8 260.0 000121» PFO poe =D 5l-F425 4-2-0 NGTD ABAD Ioiiogese4-00002«1-S+0025% 265-195-508 50011-6298 - s s opesioa moss Fig. 6: Asouraey sraph for cotton yield prediction based on R and MPPE Figure 5 shows the results of RMSE value for maize yield prediction. It clearly shows that the pure quadratic model has a lower RMSE value than the other models Prediction results for cotton yield _predictio Figure 6 shows the results based on Ré Statistic and MPPE, for cotton yield prediction using linear, pure cquacatic, interactions, quadratic, polynomial and proposed SLR regression models. As can be observed, the R-value for the SLR model is higher while MPPE is lower than the rest of the rmodals. From this t can be inferred that the SLR model aceurately predicts the cotton yield better than the other models Figure 7 shows the results of RMSE value for cotton yield prediction. It eleariy shows thatthe proposed SLR regression model has a lower RMSE value than the other models 101 39 _ 2 a 19 0 Regression models Fig. 7: Accuracy graph for cotton yield prediction based on RMSE, CONCLUSION ‘The effort demonstrated that regression techniques can be utilized for yield prediction for the area with satisfactory results. India isa standout amongst the most of the nations in creating yields in Asia and utilization of ‘wheet, Maize, cotton in numerous piece of our nation is seen broadly. Thus, an effort made to predict the yield of such in India. In order to predict the yield, Regression model is utilized as a prediction tool and some of the Jmportant factors in yield production are selected. The information gathered along, with the different atributes is used as the input variables for the regression model Along these lines, the best regression model is found for the yield. Every model is nm a few times to deal with conceivable estimations of root mean. square and R statistics values. By utilizing the best regression model for the survey, the forecast of generation of wheat, maize and eotion is done for chosen years. The outcomes demonstrate that the proposed regression model is a suitable method for foreseeing yield production, The results of different models are compared based upon the root mean square, R? statistics and pereentage prediction cor. The model which gives the lower Root mean square, percentage prediction error ane Higher R’ salstis values is considered to be the best model for crop yield prediction RECOMMENDATIONS In future, this research can be extended by applying different prediction techniques like Support Vector Regression (SVR), Neural Networks, Fuzzy Logie, ete. for predicting the yield of various crops, Further, co-elation| between predictor variables may be found which gives the importance of the variable for prediction. Int, 5. Soft Comput, 12 2): 96-102, 2017 REFERENCES: Alan, 0.S., 1993, An introduction to regression analysis, Master Thesis, University of Chicago Law School, Chicago, Minois Asseng, 5. G.C. Anderson, F.X. unin, LRP. Fillery and PJ, Dolling ef af, 1997. Use of the APSIM wheat model to prediet yield, drainage and NO3-leaching for fa deep sand. Crop Pasture Sei., 49: 363-378. Carberry, B.S. and D.G. Abrecht, 1991. Tailoring Crop Models to the Semi-Arid Tropies. Int Climatie Risk in Crop Production: Models and Management in the Semi-Arid Tropics and Sub-Tropies, Muchow, ROC, and S.A. Bellamy (Eds.), Cab Intemationel, Wallingford, England, pp: 157-182 Heam, AB, 1994. OZCOT: A simulation model for cotton crop management Agric. Syst, 4; 257-299, Nancy, R.Z., 2010, Topie: Multiple Linear Regression, ‘Stanford University, California, Qaddoum, K. and EL, Hines, 2012. Reliable yield prediction with regression neural networks. Proceedings of the 12h WSEAS Intemational Conference on Signal Processing, Computational Geomenry and Artificial Intelligence, August 21-23, 2012, WSEAS Press, Turkey, Istanbul-pp: 1 192 Sanchez, AG., JF, Solis and W.O. 2014, Attribute selection impact on linear and nonlinear regression models for crop yield prediction. Sei. World J, 2014 1-10. Zaetizadeh, M, A. Jalil, M. Khayatnezhad, R. Gholamin and'T, Mokhtari, 2011, Comparisonof Maitiple Linear Regressions (MLR) and Artificial Neural Network: (ANN) in predicting the yield wsing its components in the holless barley. Adv. Environ. Biol, 10: 109-114 Zaw, WT. and T.T. Naing, 2009, Modeling, of raingall prediction over Myanmar using polynomial regression. Proceedings of the International Conference on Computer Engineering and ‘Teehmology, January 22-24, 2009, IEEE, New York, USA,, ISBN: 978.0-7695-3521-0, pp: 316-320. Zhang, Ie. L. Lei and D. Yan, 2010, Comparison. of, two regression models for predicting crop yield Proceedings of the TEEE Intemational Symposium fon Geoscience and Remote Sensing (IGARSS), July 25-30, 2010, IEEE, New York, USA., ISBN. 978e1-4244-95658, pp: 1521-1524 Bustamante,

You might also like