Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

1

TESTING HYPOTHESES SIMULTAN PARAMETERS PARAMETRIC


AND NONPARAMETRIC COMPONENTS IN THE MODEL OF
SEMIPARAMETRIC MIXED TRAVEL SPLINE TRUNCATED AND
KERNEL
((Case Study: Total Fertility Rate Region/City in East Java)
1
Arip Ramadan, 2Ismaini Zain dan 3I Nyoman Budiantara
Department of Statistics, Fakulty of Mathematics, Computing and Data Sains
Institut Teknologi Sepuluh Nopember (ITS) Surabaya
Jl. Arief Rahman Hakim, Surabaya 60111 Indonesia
e-mail: 1aripramadan@gmail.com, 2ismainizain@gmail.com dan 3i_nyoman_b@statistika.its.ac.id

Abstrac— Hypothesis testing is used to determine the result, the resulting regression model estimates are not exact
relationship between response variables and predictor and tend to have large errors. The use of regression curve
variables. In this study, simultaneous hypothesis testing for model estimation in accordance with the data pattern will be
model parameters has a pattern of different relationships
better [4]. Therefore, the use of a mixed estimator in the
between response variables and predictor variables.
Appropriate analysis to solve the problem can use regression model is needed to accommodate differences in
semiparametric regression of mixed spline truncated and kernel the pattern of data occurring.
.. Then applied to Total Fertility Rate data in East Java Several studies involving semiparametric regression
Province in 2015. The best model obtained is using involving mixed estimators have not been widely carried out,
semiparametrik mix of spline truncated and kernel with best [5] estimating semiparametric models of mixed Truncated
linear spline of point combination knots, based on the optimum
and Kernel Spline, [6] investigating parametric and spline
GCV value obtained that is equal to 0.003964 and the coefficient
of determination is 97.04%. There is a significant influence from component parameter interval estimates in mixed spline and
the predictor variable to the response variable after hypothesis kernel semiparametric regression models, whereas [7]
testing simultaneously with the F-count value of 101.85619. estimates mixed semiparametric models of Spline Truncated
and Fourier series. From the above research, it is only limited
Keyword— Estimators of Mixed Truncated and Kernel to finding point estimates from the regression curve and no
Spline, Total Fertility Rate, Simultaneous Hypothesis Test
further inferences have been made regarding simultaneous
Parametric and Nonparametric Parameters..
hypothesis testing.
I. INTRODUCTION Hypothesis testing applications have been carried out in
various scientific fields, one of which is in the field of
R egression analysis is a method that is used to determine
the pattern of the relationship between predictor
variables with the response variable. There are several
demography. Demographic research, including Total
Fertility Rate (TFR) is a suitable field for the application of
semiparametric regression of truncated spline mixtures and
approaches in regression analysis, namely parametric kernels. This is because the relationship between the
regression, nonparametric regression and semiparametric response variables and some predictor variables tend to be
regression. Parametric regression is used if the curve unknown in relation to the pattern, but some other predictor
regression form is known as linear, quadratic, cubic, variables form a linear relationship..
exponential and so on [1]. If the data pattern tends to follow The birth rate is related to the future population.
the linear / quadratic / cubic model then the corresponding Population is the most important thing in sustaining the
regression approach for the data is linear / quadratic / cubic development of an area because it is both a subject and an
parametric regression [2]. If the pattern of the relationship object of development. As the subject of development of the
between predictor variables and response variables is population will play a role in achieving economic and social
unknown or there is no complete past information about the development that can affect the increase of social welfare,
form of data patterns then nonparametric regression is a while as the object of the development of the population is
regression approach that is recommended to be used [3]. the party that gets the results of the development of a region.
Semiparametric regression is a combination of parametric Based on a report released by the latest Indonesia
regression and nonparametric regression. Semiparametric Demographic and Health Survey in 2012, there was an
regression model is used when the response variable can have increase in national TFR from 2.41 in 2008 to 2.6 in 2012.
a certain relationship pattern with one or several predictor Based on the report, only 10 provinces experienced a decline
variables, but with other predictor variables is not known in fertility rate, while the remainder are observing an
form of relationship pattern. Often found cases where each increase. The increase in TFR experienced by other
predictor variable has a different data pattern. If it is provinces ranged from 31 percent to 63 percent. East Java
estimated to only use one form of estimator, an estimator will Province is a province experiencing a significant decrease in
be obtained that is not suitable with the data pattern. As a
2

TFR since the introduction of family planning policy, namely Suppose that given paired data ( x1i , x2i , , x pi ,
TFR East Java Province has reached below 2.1 in 2002. t1i , t2i , , tqi , z1i , z2i , , zri , yi ) then the semiparametric
However, TFR East Java Province in 2012 has a significant
increase since year 2002 which has an effect on the re- model is formulated as follows.
yi  f ( x1i , x2i , , x pi , t1i , t2i , , tqi , z1i , z2i , , zri )  i ,
upgrading of TFR Indonesia. TFR East Java has increased
TFR more than 20 percent from 2.1 in 2002 to 2.6 in 2012. i  1, 2, , n (2)
Lots of variables that can affect TFR include unmet need, with the curve assumed additive, then obtained:
age specificity fertility rate (ASFR), human development 𝑦𝑖 = 𝑓(𝑥𝑖 ) + 𝑔(𝑡𝑖 ) + ℎ(𝑧𝑖 ) + 𝜀𝑖 , 𝑖 = 1,2, … , 𝑛 (3)
index (HDI) and infant mortality rate (IMR), from the four where 𝑦𝑖 is respon variable, 𝑓(𝑥𝑖 ) is parametric, 𝑔(𝑡𝑖 ) is
variables suspected subsequently conducted analysis to get nonparametric spline component, ℎ(𝑧𝑖 ) is nonparametric
the test statistic was used for hypothesis testing component kernel, and 𝜀𝑖 is random error random which is
simultaneously and obtained semiparametric mixed spline assumed to be identical, independent and with normal
truncated and kernel semiparametric regression model on distribution with zero mean and variant 𝜎 2 .
district / city TFR data in East Java Province. which is assumed to be identical, independent and with
In semiparametric regression analysis of spline truncated normal distribution with zero mean and variant 𝑓(𝑥𝑖 ) on the
and kernel mix there are methods used to determine the equation (3) approached with linear functions (1) and can be
number of knots and the location of the optimum knot point written in the following matrix form.
using Generalizad Cross Validation (GCV) method.
ỹ = 𝐗β̃ + 𝜀̃ (4)
Then the regression curve 𝑔(𝑡𝑖 ) approached with the
II. LITERATURE REVIEW
following spline function.
A. Regression Analysis yi  g (ti )   i (5)
Regression analysis is a method used to explain the And can be written in the form of the following matrix.
relationship between predictor variables and response
𝑦̃ = 𝐺(𝑘)𝜃̃ + 𝜀̃ (6)
variables. One of the objectives of regression analysis is to
estimate or predict the value of the response variable if the Regression curve ℎ(𝑧𝑖 ) approached with the following kernel
predictor variable is set in value [5]. The relationship functions.
between response variables and predictors for n observations, 𝑦𝑖 = ℎ(𝑧𝑖 ) + 𝜀𝑖 (7)
with observations ( xi , yi ), i  1, 2,.., n are as follows. Can be written in the following matrix form.
yi  f ( xi )   i ; i  1, 2,.., n (1) ℎ̂̃ (𝑧𝑖 ) = 𝐷(𝛼)𝑦̃ (8)
Where yi is y on observation to-i, xi is a predictive The semiparametric regression model in equation (3) can
variable x on observation to -i,  i is an error or residual in the be presented in the form of a matrix as follows.
𝑦̃ = 𝑋𝛽̃ + 𝐺(𝑘)𝜃̃ + 𝐷(𝛼)𝑦̃ + 𝜀̃ (9)
third observation which is an independent random variable
where ỹ is a vector of sized response variables 𝑛 × 1, 𝐗 is a
with zero mean and constant variance  2 , f ( xi ) is the
parametric component matrix 𝑛 × (𝑝 + 1), vector β̃ sized
regression curve at the point xi [6]. (𝑝 + 1) × 1, 𝐺 is a spline-sized component matrix 𝑛 × (𝑟 +
In regression analysis there are three approaches, namely 𝑚), vector 𝜃̃ sized (𝑟 + 𝑚) × 1, matrics 𝐷(𝛼) sized 𝑛 × 𝑛
parametric regression, nonparametric regression and and 𝜀̃ is vector random error sized 𝑛 × 1.
semiparametric regression. If the pattern of relationship
between response variables and predictor variables known C. Pemilihan Titik Knot Optimal
form then used parametric regression model approach. In nonparametric and semiparametric regression with
However, if there is no known form of the relationship Spline approach, the important thing that plays a role in
between the response variables and predictor variables, the getting the Spline estimator is the optimal selection of knot
nonparametric regression model approach is used. Whereas point. One of the methods used in selecting knots is
if the form of a regression curve consists of parametric Generalized Cross Validation (GCV). According to [9] and
components and nonparametric components, a
[10], one good and widely used method because of the
semiparametric regression approach is used [7].
advantages possessed for optimum knot kicking is
B. Regresi Semiparametrik Campuran Spline Truncated Generalized Cross Validation (GCV). Meanwhile the kernel
dan Kernel, depends on bandwidth selection. Bandwidth α is a smoothing
Semiparametric regression is a combination of parametric parameter that acts to control the smoothness of the estimated
components and nonparametric components [8]. In some curve. Knots or bandwidths that are too small will produce
cases, it can be found the relationship between response an under-smoothing curve that is very rough and fluctuating,
variables with one predictor variable is linear, but the otherwise the knot point or bandwidth that is too large or
relationship with other predictor variables is unknown. wide will produce an over-smoothing curve that is very
Variables that have known data patterns or previous smooth, but not in accordance with the data pattern [11] .
information about their data patterns are classified in Selection of the optimum kot k point and bandwidth using
parametric components. While the unknown data pattern is GCV is defined as follows.
classified on nonparametric component.
3

𝑀𝑆𝐸 (𝑘,𝛼) 2
𝐺𝐶𝑉(𝑘, 𝛼) = (𝑛−1 (10) Value magnitude R will not be negative and the limitation
𝑡𝑟𝑎𝑐𝑒[(I−M(𝑘,𝛼)])2
where 𝑀𝑆𝐸(𝑘, 𝛼) is the Mean Square Error on the spline is 0  R  1 [13].
2

truncated nonparametric model and the kernel derived from G. Total Fertility Rate (TFR)
the following equation.
TFR is the average number of children born to a woman
𝑀𝑆𝐸(𝑘, 𝛼) = 𝑛−1 ∑𝑛𝑖=1(y𝒊 − ŷ𝒊 )2 (11)
from the beginning of childbearing age to the end of her
D. Pengujian Parameter Model reproductive period [14]. But there are notes to watch out for:
Model parameter test is done to see whether the a. There is no woman who dies before ending her
predictor variable has an effect on the response variable. In reproductive period.
this study, the test of model parameters was carried out after b. The fertility rate according to age does not change
obtaining a semiparametric regression model of a truncated at a certain time period [15].
and kernel spline mixture with optimal knot points based on TFR describes the fertility history of a number of
the minimum GCV value. hypothetical women during their reproductive period. This is
Testing the model parameters performed in this study is consistent with the death history of the cross sectional life
simultaneous or simultaneous testing. This test is a test of the table. In practice TFR is done by adding up the Fertility Rate
significance of all parameters in the overall model. The of women according to age, if the age is five years in stages,
hypothesis is used as follows. assuming that fertility by single age is equal to the average
H 0 : 1   2    p  1   2   q  0 fertility rate of the five-year age group.
The weakness of the TFR calculation is that for all TFR
H1 : minimum there is one i or  j  0
women during the fertile period there is no one who dies, all
Statistics test : are married, and have children with patterns like ASFR, even
MS regresi though this is not in accordance with reality.
Fhitung  (12)
MS residual
where the rejection area is reject H 0 if Fhitung  Fp ; n 1 p III. RESEARCH METHODOLOGY
where the p value is a parameter in the regression model A. Overview of General Objects
while n is the number of observations. Calculate the value of This study uses district / city data in East Java Province.
F test statistics obtained from Variance Analysis (ANOVA) The East Java Province consists of 38 regions covering 29
in Table 1 below. regencies and 9 cities. In detail can be stated in Table 2
Tabel 1 Analisis Ragam (ANOVA) below.
Source of Degree Sum of Mean Tabel 2 List of Regional Names in East Java Province
Variation of Square (SS) Square NAME NAME NAME
NO NO NO
Fcount REG/CITY REG/CITY REG/CITY
freedom (MS)
1 Pacitan 14 Pasuruan 27 Sampang
(df)
2 Ponorogo 15 Sidoarjo 28 Pemekasan
Regression p-1 2 SSregression
 ' X ' Y  nY 3 Trenggalek 16 Mojokerto 29 Sumenep
df regression 4 Tulungagung 17 Jombang 30 Kota Kediri
Residual n-p Y 'Y   ' X 'Y SSresidual MSregression 5 Blitar 18 Nganjuk 31 Kota Blitar
df residual MSresidual Kota
6 Kediri 19 Madiun 32
Malang
Total n-1 Y ' Y  nY
2
7 Malang 20 Magetan 33 Probolinggo
E. Pengujian Asumsi Residual Kota
8 Lumajang 21 Ngawi 34
Pasuruan
In the semiparametric regression model linear truncated Kota
Spline is assumed to be a random error with an independent 9 Jember 22 Bojonegoro 35
Mojokerto
normal distribution with zero mean and variance  2 {12]. Kota
Therefore, before performing the analysis and making a 10 Banyuwangi 23 Tuban 36
Madiun
decision from the modeling results, the residual assumption Kota
test is done first. The residual assumption test performed is 11 Bondowoso 24 Lamongan 37
Surabaya
independent test, identical test and normality test. 12 Situbondo 25 Gresik 38 Kota Batu
F. Koefisien Determinasi 13 Probolinggo 26 Bangkalan
2
The coefficient of determination ( R ) is a measure of the
accuracy or accuracy of the regression model, or the B. Data Source
magnitude of the contribution of the predictor to the This study uses secondary data in 2015 from the
response. The Determination Coefficient Formula is as publication of the National Population and Family Planning
follows. Agency of East Java Province with an observation unit
SSR (Yˆ  Y ) ' (Yˆ  Y ) covering 38 districts / cities and the Central Bureau of
R2   (13)
SST (Y  Y ) ' (Y  Y ) Statistics of East Java Province.
4

C. Research Variables variable 𝑡1𝑖 , 𝑡2𝑖 , … , 𝑡𝑞𝑖 dan 𝑧1𝑖 , 𝑧2𝑖 , … , 𝑧𝑟𝑖 is a
The response variables used in this study are categorical nonparametric component. The relationship between
data, namely TFR by district / city in East Java Province in responses and predictors is assumed to follow an
2015 and variables that are thought to affect TFR. These additive semiparametric regression model.
variables can be described in Table 3. 𝑦𝑖 = 𝜇(𝑥1𝑖 , 𝑥2𝑖 , … , 𝑥𝑝𝑖 , 𝑡1𝑖 , 𝑡2𝑖 , … , 𝑡𝑞𝑖 , 𝑧1𝑖 , 𝑧2𝑖 , … , 𝑧𝑟𝑖 ) +
Tabel 3 Operational Definition of Variables 𝜀𝑖 , 𝑖 = 1, 2, … , 𝑛, 𝜀𝑖 ~𝐼𝐼𝐷𝑁
𝑝 𝑞 𝑟
Var Name of Operational definition
Variabel = ∑ 𝑓𝑗 (𝑥𝑗𝑖 ) + ∑ 𝑔𝑠 (𝑡𝑠𝑖 ) + ∑ ℎ𝑘 (𝑧𝑘𝑖 ) + 𝜀𝑖
Y TFR (Total The average number of children born 𝑗=1 𝑠=1 𝑘=1
Fertility to a woman from early childhood to 2. Approaching the regression curves with the following
Rate) the end of her reproductive life functions.
Unmet Need Percentage figures indicating unmet a. Come over 𝑓(𝑥𝑖 ) by using linear functions.
X1 family planning needs or the b. Approaching the function 𝑔𝑠 (𝑡𝑠𝑖 ) using the Spline
proportion of women of reproductive Truncated function.
age who are married or coexisted c. Approaching the function ℎ𝑘 (𝑧𝑘𝑖 ) use the Kernel
(sexually active) who do not wish to function.
have more children or who want to 3. Semiparametric regression models mix truncated spline
enlarge the next birth within a period and kernel in matrix form.
of at least 2 years but do not use
𝑦̃ = 𝑿𝛽̃ + 𝑮(𝑡)𝜃̃ + 𝑫(𝛼)𝑦̃ + 𝜀̃, 𝜀̃~𝑁(0, 𝜎 2 𝐼)
contraceptive tools or methods a
district / city where, 𝑓̃(𝑥) = 𝑿𝛽̃ , 𝑔̃(𝑡) = 𝑮(𝑘)𝜃̃, dan ℎ̃(𝑧) = 𝑫(𝛼)𝑦̃.
ASFR (Age The number of births per 1000 women 4. Estimating the regression curve 𝑓̃, 𝑔̃ and ℎ̃.
X2 Spesific in a particular age group between 15- 5. Formulate a simultaneous hypothesis test for parametric
Fertility 49 years component parameters in the semiparametric regression
Rate) model of the truncated and kernel spline mixes as
Human Measurement of comparison of life follows.
X3 Development expectancy, literacy, education and H 0 : 1   2    p  1   2    r m  0
Index living standards for all countries H1 : minimal there is one  j  0, j  1, 2, , p or  s  0, s  1, 2, , r  m
around the world.
Infant Banyaknya kematian bayi pada satu 6. Determine the parameter space below H 0  
X4 Mortality
Rate
tahun tertentu per 1000 kelahiran
hidup pada tahun yang sama
  0 , 2  
7. Search for likelihood function under space H 0  
D. Data Structure
The unit of observation used in this study was 38 districts / 8. Maximizing the likelihood function below  L    
cities in Central Java Province with four predictor variables.
Thus, the research data structure used is as follows. 9. Look for the likelihood function under space 
Tabel 4 Structure of Research Data
Predictor
10. Maximizing the likelihood function below  L    
Res Parametric 11. Making likelihood ratio
  ,0   1
Nonparametric component
component
N
L 
o y x1 x2 xp t1 t2 tq z1 z2 zr 
 x1i , , x pi , t1i , 
, tqi , y 
L 
1 y1 x1;1 x2;1 x p ;1 t1;1 t 2;1 t q ;1 z1;1 z2;1 z r ;1
12. Search for test statistics based on step (11).
2 y2 x1;2x2;2 x p ;2 t t2;2 t q ;2 z1;2 z 2;2 zr ;2 13. Obtain test statistic distribution.
1;2
14. Determine the area of rejection of the hypothesis H 0 ,

38 y38 x1;38x2;38 x p ;38 t1;38 t 2;38 t q ;38 z1;38 z 2 ;38 z r ;38



through  x1i , , x pi , t1i , , tqi , y   k , for a constant k.
The second objective of this study is to model TFR using
spline truncated semiparametric regression and the kernel
E. Research Steps then tests hypotheses on model parameters. The steps taken
The first objective in this study was to conduct a study of are:
simultaneous hypothesis testing in semiparametric regression 1. Create a plot of response variables with all predictor
models of truncated spline mixtures and kernels. The stages variables.
of research carried out to complete this first goal are as 2. Create a plot of response variables with all predictor
follows. variables.
1. Given a response 𝑦𝑖 with parametric component 3. Model response variables and predictor variables using
predictor variables 𝑥1𝑖 , 𝑥2𝑖 ,…, 𝑥𝑝𝑖 while the predictor spline truncated mix estimators and kernels in
5

semiparametric regression. MSE  k ,  


4. Modeling data with semiparametric regression of mixed GCV  k ,    (17)
n tr  I  M (k ,  )  
1 2
spline truncated and kernel with one, two, three, and
combinations of knots. Where :
5. Select the optimal point knots, parameters and n 2

bandwidth α based on the GCV method. MSE  k ,    n 1   yi  yˆi 


i 1
6. Testing the significance of parameters simultaneously
and partially. (18)
7. Testing independent assumptions, identical and normal C. Formulation of Hypothesis Test
distributions for residuals. To find out whether the parameters  effect on the model,
it can be carried out hypothesis testing simultaneously.
IV. RESULTS AND DISCUSSION
Hypothesis testing simultaneously begins by specifying the
A. Semiparametric Regression Truncated Spline and hypothesis formulation, namely
Kernel Regression Models H 0 : cuT   τ
Given data in pairs  x1i , , x pi , t1i , , tqi , zi , , zri , yi  and the H1 : cuT   τ (19)
paired data relationship follows the semiparametric
where cuT is a vector that has elements of zero value except
regression model of the truncated and kernel spline mixes as
for elements ke-h equal 1. cu   0 0 sized .
T
follows. 0 1

yi    x1i , , x pi , t1i , , tqi , zi , , zri    i , i  1, 2, , n


1  p  1   m  r  q 
 T  ( 0 , ,  p , 11 , , m1 , ( m1)1 , , 1q , , mq ,  ( m 1) q ,
with a regression curve  in equation (14) it is assumed
to be additive, so it can be written into: ,  ( m  r ) q ) so that  berukuran ( p  1   m  r  q)  1 and
yi  f1  x 1i   f p  x  pi  g1  t1i    g q  tqi   τ  0.
It was found that the test statistic can be symbolized as F
h1  z2i    hr  zri    i
ie :
p q
yi   f j  x ji    g s  tsi    hk  zki    i
r
Q1 / d1
(14) F (20)
j 1 s 1 k 1 Q2 / d 2
Where 𝑦𝑖 is the response variable, f j x ji   is a parametric Q1 /  2

g s  tsi  F  1  m 2r
component, is a truncated spline nonparametric Q2 / 
component, g s  tsi  is a non-parametric component of the n 1 m  r
Q1
kernel, and ε_i is a random error that is assumed to be
identical, independent and normally distributed.  1  m  r ~ F1 m  r , n 1 m  r 
Q2
B. Estimation of Semiparametric Regression Curves Spline n 1 m  r
Truncated and Kernel Mixes
Q
A mixed estimate of linear and linear spline in kernel in  1 ~ F d1 , d2 
semiparametric regression was obtained: Q2
ˆ ˆ Critical area for testing H 0 : cuT    vs H1 : cuT    is
ˆ ( x, t , z )  f ( x)  gˆ (t )  h ( z )
c  x ,
1i , x pi , t1i , 
, tqi  ; F  k * where
 XA(k ,  ) y  GB(k ,  ) y  D( ) y
d2   n2 
k*   k  1
 ( XA(k ,  )  GB(k ,  )  D( )) y (15) d1  

 ((( XA  GB)(k )  D) ) y D. Application on TFR Data in East Java Province


The model and estimation of semiparametric regression
 M( X, G , D, k ,  ) y intervals Truncated splines that have been discussed in
section 4.1 will be applied to the 2015 TFR data in East Java
where : Province. The response variable used was TFR (Y) while the
M(k ,  )  XA(k ,  )  GB(k ,  )  D( ) (16) predictor variables used were 4 variables, namely Unmet
Furthermore, the selection of optimal knocking points and Need (𝑿𝟏 ), Age Spesific Fertility Rate (𝑻𝟏 ), Human
Development Index (𝑻𝟐 ), and Infant Mortality Rate (𝒁𝟏 ).
bandwidth  optimal using the GCV method. So that the
The observation unit used was 38 districts / cities.
GCV value of the mixed Spline model is truncated and the
kernel is obtained in a way :
1. Descriptive Analysis
6

The results of descriptive statistics can be used to initiate 2.50


Scatterplot of Y vs X1 (UN)
2.50
Scatterplot of Y vs T1 (ASFR)

knot points in the next analysis stage. 2.25 2.25

Tabel 4 Descriptive statistics


2.00 2.00

Y
Standar
Var Mean Minimum Maksimum Range 1.75 1.75

Deviasi
1.50 1.50

2,06 1,52 2,45 0,21 0,93 10 15 20 25 30 35 0 10 20 30 40 50 60 70 80 90

𝑌 X1 (UN) T1 (ASFR)

Scatterplot of Y vs T2 (IPM) Scatterplot of Y vs Z1 (AKB)


2.50 2.50
69,11 58,18 80,05 5,40 21,87
𝑋1 2.25 2.25

16,01 6,87 31,77 5,01 24,9 2.00 2.00

𝑇1

Y
1.75 1.75

36,21 6,40 87,00 18,79 80,60


𝑇2 1.50 1.50
60 65 70 75 80 20 30 40 50 60
T2 (IPM) Z1 (AKB)

30,92 17,27 60,51 12,09 43,24


𝑍1 Pigure 1. Plot of Variable Response with Predictor
Variables
From table 4.1 above can be seen the characteristics of
each variable, both the response variable and the predictor
A summary of parametric component and nonparametric
variable. Based on the above response variable, TFR in East
component results is presented in the table 4.2.
Java Province has an average TFR value in 2015 of 2.06 or it
can be said that the average is closer to 2 people with a
Tabel 5. Parametric and Nonparametric Components
standard deviation of 0.21. The highest TFR rate of 2.45 is in
Sampang District. While the lowest TFR with a value of 2.05 Notasi Variabel Komponen
is in Sumenep regency. Range value of TFR East Java 𝑋1 Unmet Need Parametric
Province with 38 districts / cities of 0.93. 𝑇1 ASFR Nonparametric
The following is a description of the characteristics for 𝑇2 HDI Nonparametric
each predictor variable, namely the Human Development 𝑍1 IMR Nonparametric
Index (𝑿𝟏 ), Unmet Need (𝑻𝟏 ), Age Spesific Fertility Rate
(𝑻𝟐 ) and Infant Mortality Rate(𝒁𝟏 ). 3. Modeling TFR in East Java Province Using
a. The average of Unmet Need in East Java Province in Semiparametric Mixed Truncated Splines and Kernel
2015 was 16.01 with a standard deviation of 5.01. The One Knots
highest Unmet Need in Bangkalan District is 31.77 and The semiparametric regression model is linear truncated
the lowest is in Bondowoso Regency with 6.87 with spline with one parametric component variable and four
range of 24.90 means. nonparametric component variables with one knot point are
b. The average ASFR in East Java Province in 2015 was as follows.
yi   0  1 xi  11t1i   21  t1i  K11   12 t2i   22  t2i  K12  
1 1
36.21 with a standard deviation of 18.79. The highest
ASFR in Bondowoso Regency was 87.00 and the  1 1  1  z1  z1i  
lowest was in Malang City with 6.40.  e 2  1   (21)
n
 1 2 
c. The average Human Development Index in East Java
Province in 2015 was 69.11 with a standard deviation
 
i 1  n 1
1  z1  z1i   yi   i ; i  1,2, ,n
1  2  1  
of 5.4. The highest Human Development Index in  e 
 i 1 1 2 
Malang City was 80.05 and the lowest was in Sampang
District with 58.18 with a range of 24.90. The GCV values produced using semiparametric
d. The average IMR in East Java Province in 2015 was regression of truncated spline and kernel mixtures with
30.92 or 31 people per 1000 live births with a standard one knot are presented in the following Table 6.
deviation of 12.09. The highest AKB in Probolinggo
regency is 60,51 or 61 people and the lowest one is in Tabel 6. Comparison of GCV Values using One Knot Point
Blitar city with the number 17,27 or 17 people with Spline Kernel
range equal to 43,24. Knot Bandwidth GCV 𝑹𝟐
𝒕𝟏 𝒕𝟐
2. TFR Modeling Using Semiparametric Regression of 𝜶
= 𝑲𝟏𝟏 = 𝑲𝟐𝟏
Spline Truncated and Kernel Blends 56.3 71.72 0.0485 0.004164 95.06
Initial step done to do semiparametric regression
60.13 72.76 0.0494 0.004268 94.69
modeling process Spline truncated linear is to make scatter
plot between response variable with each predictor variable. 52.46 70.68 0.048 0.004301 94.99
From the scatter plot this can be seen form the pattern of 63.97 73.8 0.0501 0.00445 94.28
relationship between response variables with each predictor Based on Table 6 the minimum GCV value generated is
variable. Here is the scatter plot result between response equal to 0.004164. The location of the knot point on the
variables with each predictor variable: variable (𝑡1 ) that is 56.3(𝐾11 ) and (𝑡2 ) that 71.72 (𝐾12 )
while the bandwidth provided is as big as 𝛼 = 0.0485.
7

4. Modeling TFR in East Java Province with Truncated Two SplineKnot Kernel
Point Knot Spline Components
Knot Bandwidh
The truncated spline semiparametric regression model
using two points knots with one parametric component 𝒕𝟏 𝒕𝟏
predictor and five nonparametric components are as follows. = 𝑲𝟏𝟏 = 𝑲𝟏𝟐 GCV 𝑹𝟐
yi   0  1 xi  11t1i   21  t1i  K11   31  t1i  K 21   12t2i   22  t2i  K12  
1 1 1 𝒕𝟐 𝒕𝟐
(22) 𝜶
 1 1  1  z1  z1i   = 𝑲𝟐𝟏 = 𝑲𝟐𝟐
 e 2  1   𝒕𝟑 𝒕𝟑
n
  2 
32  t2i  K 22    1  yi   i ; i  1,2,
1
,n
i 1

n
1  z1  z1i 
1 1  2  1  
= 𝑲𝟑𝟏 = 𝑲𝟑𝟐
   2 e  57.69 72.1
 i 1 1 
GCV values produced using semiparametric regression of 72.35 76.07 0.04277 0.004115 97.05
truncated spline and kernel mixtures with one knot are 79.67 78.06
presented in Table 7.
65.02 74.09
Table 7. Comparison of GCV Value by Using Two Knot
Points 72.35 76.07 0.04357 0.00423 96.66
Spline Kernel 79.67 78.06
Knot Bandwidth 50.36 70.11
𝒕𝟏 𝒕𝟏 72.35 76.07 0.04242 0.004295 97.05
GCV 𝑹𝟐
= 𝑲𝟏𝟏 = 𝑲𝟏𝟐 79.67 78.06
𝜶
𝒕𝟐 𝒕𝟐 43.04 68.12
= 𝑲𝟐𝟏 = 𝑲𝟐𝟐 72.35 76.07 0.04213 0.004412 97.12
68.4 80.8 0.04472 0.004185 96.32
79.67 78.06
75 78.37
18.8 56 0.04909 0.004207 95.32
Based on Table 8 the minimum GCV value generated is
61.54 71.64 equal to 0.004115 The location of the knot point on the
68.4 74.6 0.004303 95.91 variable (𝑡1 ) that is 57.69 (𝐾11 ) 72.35(𝐾12 ) 79.67(𝐾13 ) for
0.04553
(𝑡2 ) that is 72.1(𝐾21 ) 76.07(𝐾22 ) 78.06(𝐾23 ) while the
75 76.69
bandwidth given is as large as 𝛼 = 0.04277.
12.6 56 0.05001 0.004346 95.09
59.86 71.64 5. Model TFR in East Java Province with Spline Truncated
Based on Table 7 the minimum GCV value generated is Component of Knot Point Combination
equal to 0.004185. The location of the knot point on the The selection of a combination of knots is done by
variable(𝑡1 ) that is 68.4 (𝐾11 ), 80.8 (𝐾12 ), for (𝑡2 ) that is 75 combining the optimum knots that have been obtained
(𝐾21 ) 78.37 (𝐾22 ) while the bandwidth provided is as big as previously from the calculation of 1 knot, 2 knots and 3
𝛼 = 0.04472. knots. Furthermore, the minimum GCV will be calculated
based on the combination obtained and the model chosen
4. Modeling TFR in East Java Province with Truncated Three with the minimum GCV among the combinations. The
Point Spots Components truncated spline semiparametric regression model using a
The truncated spline semiparametric regression model combination of point knots with one parametric component
using three point knots with one parametric component predictor and three nonparametric components are as
predictor and five nonparametric components are as follows. follows.
y     x   t   t  K    t  K    t  K    t  (23)
1 1 1
i 0 1 i 11 1i 21 1i 11  31 1i 21  41 1i 21  12 2 i

yi   0  1 xi  11t1i   21  t1i  K11   31  t1i  K 21   12t2i   22  t2i  K12  


1 1 1
 1 1  1  z2  z2 i  
 e 2  2  
n
  2 2   1 1 1 z z 
  1 1i  
 22  t2i  K12   32  t2i  K 22    42  t2i  K 32     yi   i
1 1 1
1  z2  z2 i   e 2  1 

i 1
 1 1  2   2    1 2 
n

 
n
32  t2i  K 22    42  t2i  K32     yi   i i  1,2,
e 1 1
  n ,n
 i 1 2 2  1  z1  z1i 
i 1
 1 1  2  1  
i  1,2, ,n
   2 e 
 i 1 1 
GCV values produced using a semiparametric regression
of truncated spline and kernel mixture with three knots are (24)
presented in Table 8. GCV values produced using semiparametric regression of
truncated and kernel spline mixtures with a combination of
Table 8 Comparison of GCV Values using Three Point knots are presented in Table 9.
Knots Based on Table 9 the minimum GCV value generated is
0.003964 with the combination of knot point, 2.3 and the
location of the knot point on the variable(𝑡1 ) that is 68.4
(𝐾11 ) 80.8 (𝐾12 ) and for (𝑡2 ) that is 72.1 (𝐾21 ) 76.074 (𝐾22 )
78.062 (𝐾23 ) meanwhile bandwidth given is equal to 𝛼 =
0.0427.
Tabel 9 Comparison of GCV Value Using Combination of
Knot Point
8

SplineKnot Kernel H0 :  = 0
Knot Bandwidth H1 : minimal there is on p  0 or j  0
𝒕𝟏 𝒕𝟐
= 𝑲𝟏𝟏 = 𝑲𝟐𝟏 GCV Combination 𝑹𝟐 Tabel 11. ANOVA Hasil Regresi Semiparametrik campuran
𝒕𝟏 𝒕𝟐 spline truncated dan kernel
𝜶
= 𝑲𝟏𝟐 = 𝑲𝟐𝟐 Source of df SS MS F count
𝒕𝟏 𝒕𝟐 Variation
= 𝑲𝟏𝟑 = 𝑲𝟐𝟑 Regression 8 1.515 0.168358
68.4 72.1 Error 29 0.046 0.001652 101.85619
80.8 76.07 0.0427 0.0039 2,3 97.0 Total 37 1.561
78.06
ANOVA results on hypothesis testing simultaneously
56.3 72.1 presented in Table 11. Based on Table 4.9 shows that the
76.07 0.04275 0.0040 1,3 96.9 Fcount of 101.85 is greater than the value of F (0.05,8,29)
78.06 that is 2.28 and the value of p-value of 0.00 is smaller from
the value α (0.05), so that the decision to reject H0 is
68.4 71.72
obtained. So it can be concluded that there is at least one
80.8 0.04848 0.0040 2,1 95.1 significant predictor variable in the model. Semiparametric
57.69 71.72 mixed spline truncated regression model with combination of
72.34 0.04865 0.004 3,1 95.1 knots has a coefficient of determination (R2) of 97.04
percent. This shows that the variation of the response variable
79.67
can be explained by the five predictor variables of 97.04
percent, while the remaining 02.96 percent is explained by
A. Selection of the Best Model other factors.
Based on the value of GCV for each knot point that has
V. CONCLUSION
been calculated before, next performed the best model
selection by comparing the value of GCV produced by each Based on the results and discussion that has been done then
model shown in Table 10. it can be concluded as follows.
1. Estimation of Parameters of Semiparametric Spline
Table 10. GCV Minimum Value at Each Model Truncated Regression Intervals
Knot GCV R-Square a. The distribution of the test statistic obtained follows
1 Knot Point 0.004164 95.06 % the F-distribution with degrees of freedom.
2 Knot Points 0.004185 96.32 % Q1
3 Knot Points 0.004115 97.05 % (1)
Combination of Knot 0.003964 97.04 %* F ~ F1,n( p 1( m r ) q )
Q2
Table 10 shows that the minimum GCV value is found in the
combination of knot points. The model chosen is the Spline n  ( p  1  (m  r )q)
model with three knots. After obtaining the minimum GCV 2. Applications in TFR data in East Java Province in 2015
score for the linear truncated Spline model, the next step got the following results.
calculates the estimate for the linear truncated Spline model. a. The best model obtained is model that using a
Estimated linear truncated Spline model with three knot combination of knot points.
points as follows.
yˆ i  0.17519302433  0.00012335311x1i  0.00079110874t1i  0.00166672135  t1i  68.4   yˆ i  0.17519302433  0.00012335311x1i  0.00079110874t1i  0.00166672135  t1i  68.4  
1 1

0.00055557378  t1i  80.80   0.00329040416t2i  0.01911809653  t2i  72.1  0.00055557378  t1i  80.80   0.00329040416t2i  0.01911809653  t2i  72.1 
1 1 1 1

0.08784561457  t2i  76.07   0.19710550155  t2i  78.062   0.08784561457  t2i  76.07   0.19710550155  t2i  78.062  
1 1 1 1

 1 z z 
  1
1  z1  z1i 
1  2  0,0427  
  1 1i 
1 1  e 
 e 2  0,0427 
 n
 0,0427 2 y

n
 0,0427 2 y   n 1  z1  z1i   i
 n 1  z1  z1i   i i 1
 1 1  2  0,0427  
i 1
 1 1  2  0,0427    e 
 0,0427 e   i 1 0,0427 2 
 i 1 2 
b. Of the four predictor variables analyzed, there is
(25)
Linear Spline truncated regression model combination of an influence on the response variable
these knot points has a R2 of 97.04%. This means that this simultaneously.
model can explain TFR of 97.04%.. c. Coefficient of determination (R2) the amount
obtained is equal 97.04 percent, the amount
B. Parameter Significance Tests Simultaneously obtained is equal.
Testing the hypothesis to test the significance of
parameters simultaneously uses the following hypothesis:
9

REFFERENCE
[1] Hardle, W., (1994). Applied Nonparametric Regression. Cambridge
University Press. New York.
[2] Hardle, W., (1990), “Applied Nonparametrik Regression”, Cambridge
University Press, New York.
[3] Budiantara, I.N., (2009), "Spline Dalam Regresi Nonparametrik dan
Semiparametrik : Sebuah Pemodelan Statistika Masa Kini dan Masa
Mendatang", Pidato Pengukuhan untuk Jabatan Guru Besar, Institut
Teknologi Sepuluh Nopember, ITS Press, Surabaya.
[4] Merdekawati, I.P., dan Budiantara, I.N., (2013), ”Pemodelan Regresi
Spline Truncated Multivariabel pada Faktor-Faktor yang
mempengaruhi Kemiskinan di Kabupaten/Kota Provinsi Jawa
Tengah”, Jurnal Sains dan Seni POMITS, Vol. 2, No.1, hal. 19-24.
[5] Drapper, N. R., dan Smith,H. (1992),Analisis Regresi Terapan, PT.
Gramedia Pustaka Utama Jakarta.
[6] Eubank, R., (1999), Nonparametric Regression and Spline Smoothing,
Marcel Dekker, New York.
[7] Budiantara, I.N., (2005), “Regresi Spline Linier”, Makalah
Pembicara Utama pada Seminar Nasional Matematika, FMIPA
Universitas Diponegoro, Semarang.
[8] Budiantara, I.N., (2009), "Spline Dalam Regresi Nonparametrik dan
Semiparametrik : Sebuah Pemodelan Statistika Masa Kini dan Masa
Mendatang", Pidato Pengukuhan untuk Jabatan Guru Besar, Institut
Teknologi Sepuluh Nopember, ITS Press, Surabaya.
[9] Wahba,G., (1990) Spline Models for Observational Data.
Philadelphia:Society.
[10] Wang, Y., 1998, “Spline Smoothing Models with Correlated Error”,
Journal of the Royal Statistical Sociaty, Series B, 5r0, 341-348.
[11] Hardle, W., (1994). Applied Nonparametric Regression. Cambridge
University Press. New York.
[12] Wahba,G., (1990) Spline Models for Observational Data.
Philadelphia:Society.
[13] Gujarati, D. N., dan Porter, D. C. (2015). Dasar-Dasar Ekonometrika
Volume 1, 2nd Edition. Jakarta:Penerbit Salemba Empat.
[14] BKKBN. 2017., Buku Saku Parameter Kependudukan Jawa Timur
Tahun 2017. Jawa Timur: BKKBN
[15] Mantra, I.B., 2006. Demografi Umum. Edisi 2. Penerbit Pustaka
Pelajar: Yogyakarta.

You might also like