Analysis of Fishing Data - Application of Count Regression

You might also like

Download as ppsx, pdf, or txt
Download as ppsx, pdf, or txt
You are on page 1of 28

ANALYSIS OF FISHING DATA

APPLICATION OF COUNT REGRESSION

Anirban Ray Semester VI Roll No. 446 (2014-17)


CONTEXT OF THE DISSERTATION
WHAT USUALLY HAPPENS …

• FORECASTING IS REQUIRED IN MANY PRACTICAL SITUATIONS


• SUPPOSE Y IS THE RESPONSE VARIABLE ABOUT WHICH WE ARE INTERESTED
• AND X1, X2, …, XP ARE SEVERAL EXPLANATORY VARIABLES, POSSIBLY AFFECTING Y

• OFTEN HAPPENS THAT WE WISH TO PREDICT Y BASED ON X 1, X2, …, XP


• DATA IS COLLECTED AND ENTERED IN A SOFTWARE
• METHOD OF ORDINARY LEAST SQUARES IS PERFORMED AND COEFFICIENTS ARE
INTERPRETED
• CONCLUSION IS MADE – “FIT IS MORE OR LESS SATISFACTORY”
… IS NOT ALWAYS CORRECT

• O. L. S. GIVES BEST RESULTS WHEN DATA COMES FROM A NORMAL POPULATION


• (ASSUMING ABSENCE OF HETEROSCEDASTICITY, AUTOCORRELATION AND
MULTICOLLINEARITY)
• PRACTICALLY THAT RARELY HOLDS GOOD
• CONTINUOUS DATA CAN BE ASSUMED TO HAVE COME FROM GAUSSIAN DISTRIBUTION
• BY VIRTUE OF CENTRAL LIMIT THEOREM OR BY MAKING SOME TRANSFORMATIONS
• UNFORTUNATELY NOT SO EASY FOR DISCRETE DATA
• ALTERNATIVE METHODS MUST BE APPLIED FOR SUCH CASES
GENERALIZED LINEAR MODELS
WHAT IS G. L. M.

• BROAD CLASS OF LINEAR REGRESSION MODELS


• GENERALLY CONSISTS OF THREE COMPONENTS
• SYSTEMATIC COMPONENT IS A LINEAR PREDICTOR OF THE EXPLANATORY
VARIABLES
• RANDOM COMPONENT REFERS TO THE PROBABILITY DISTRIBUTION OF THE
RESPONSE
• LINK FUNCTION RELATES THE OTHER TWO COMPONENTS
• LINK IS A ONE-TO-ONE FUNCTION OF EXPECTATION OF THE RESPONSE VARIABLE

Source : Analysis of Discrete Data


SCOPES OF G. L. M.

• APPLICABLE FOR INDEPENDENT LINK APPLICATION


OBSERVATIONS FROM EXPONENTIAL IDENTITY OLS, ANOVA, ANOCOVA
FAMILY POISSON, NEGATIVE
LOG
• PROVIDES ESTIMATES BASED ON BINOMIAL
METHOD OF MAXIMUM LIKELIHOOD INVERSE GAMMA
LOGIT MULTINOMIAL, LOGISTIC
• EXAMPLES OF LINK FUNCTIONS
HANDLING REAL LIFE DATA
WHAT TO DO …

• TO FIT SUITABLE THEORETICAL DISTRIBUTIONS TO THE DATA


• TO TEST WHETHER THE FIT IS SATISFACTORY OR NOT
• TO FIT A REGRESSION MODEL BASED ON THE FIT (IN CASE IT IS
SATISFACTORY)
• TO COMPARE FITTED REGRESSION MODELS, IF THERE ARE MORE THAN ONE
• TO INTERPRET THE COEFFICIENTS OF THE BEST FITTED MODEL
GOODNESS OF FIT TEST

• TO
  TEST WHETHER A FIT IS GOOD OR NOT, PEARSON'S CHI-SQUARED
STATISTIC CAN BE USED
• , ASYMPTOTICALLY (PROVIDED THE FIT IS GOOD)

• IF UNKNOWN, NEED TO BE ESTIMATED BY USING THE FITTED MODEL


• IN CASE THERE ARE INDEPENDENT PARAMETERS,
• CHI-SQUARE APPROXIMATION WORKS IF SAMPLE SIZE IS LARGER THAN 50
AND
AKAIKE INFORMATION CRITERION

• AS
  A MEASURE OF RELATIVE QUALITY OF TWO STATISTICAL MODELS FOR A
GIVEN A GIVEN SET OF DATA, A. I. C. CAN BE USED

• AS LOG-LIKELIHOOD OF A MODEL INCREASES, IT EXPLAINS THE DATA BETTER


• THUS THE MODEL WITH LOWER VALUE OF A. I. C. IS THE BETTER ONE
• THIS CAN BE USED ONLY WHEN FITTING IS DONE BY METHOD OF MAXIMUM
LIKELIHOOD
Source : Wikipedia
DATA UNDER STUDY
DESCRIPTION OF THE PROBLEM

• FISHING IS ALLOWED IN A STATE


PARK count NUMBER OF FISHES CAUGHT
persons NUMBER OF PERSONS
• WILDLIFE BIOLOGISTS WANT TO
child NUMBER OF CHILDREN
MODEL HOW MANY FISH ARE BEING
camper WHETHER BROUGHT CAMPER
CAUGHT OR NOT
• DATA COLLECTED ON 250 GROUPS
• FOLLOWING OBSERVATIONS ARE
NOTED FOR ALL GROUPS

Source : INSTITUTE FOR DIGITAL RESEARCH AND EDUCATION


PLAUSIBLE MODELS

• BEING THE MOST COMMON COUNT MODEL, POISSON MODEL CAN BE USED
PRIMARILY, PROVIDED MEAN AND VARIANCE OF NUMBER OF FISHES CAUGHT
ARE APPROXIMATELY EQUAL
• IF MEAN IS SIGNIFICANTLY LESS THAN VARIANCE, NEGATIVE BINOMIAL COULD
BE AN APPROPRIATE MODEL
• IF ZEROES (EITHER ALL OR SOME) OCCUR DUE TO SOME INDEPENDENT
PROCESS FROM THE COUNT PROCESS, HURDLE OR ZERO-INFLATED MODELS
CAN GIVE BETTER RESULTS
INITIAL APPROACHES

DEFINING DESCRIPTIVE
VARIABLES MEASURES COLUMN DIAGRAM
   
OBSERVATIONS

• WIDE VARIATION IN NUMBER OF FISHES CAUGHT, WITH AVERAGE BEING


QUITE SMALL
• POISSON MODEL CAN BE SUSPECTED TO BE VERY POOR
• SIGNIFICANT PRESENCE OF OVER-DISPERSION
• NEGATIVE BINOMIAL WOULD PROBABLY GIVE BETTER FIT
• TOO MANY ZEROES IN THE AVAILABLE DATA - NO FISHING DATA IS NOT
RECORDED
• ZEROES ARE GENERATED BY TWO INDEPENDENT BINARY AND COUNT
PROCESSES
• ZERO-INFLATED POISSON MODEL COULD BE AN ALTERNATIVE
FITTING OF DIFFERENT MODELS
POISSON FITTING

•  P. M. F. OF DISTRIBUTION -
• M. L. E. OF IS THE SAMPLE MEAN
• , AN EXTREMELY UNLIKELY VALUE
• POISSON MODEL PROVIDES A POOR FIT FOR FISHING DATA
NEGATIVE BINOMIAL FITTING

•  P. M. F. OF DISTRIBUTION -
• M. L. E. OF IS THE SAMPLE MEAN
• SOLVING SCORE EQUATION , M. L. E. OF IS 5.438.
• , WHICH IS A LIKELY VALUE AT 1% LEVEL OF SIGNIFICANCE
• NEGATIVE BINOMIAL PROVIDES A SATISFACTORY FIT FOR OUR DATA
NEGATIVE BINOMIAL REGRESSION

• WE
  PROCEED TO FIT A REGRESSION MODEL WITH LOGARITHMIC LINK
FUNCTION
• MODEL :
• ARE UNKNOWN PARAMETERS, IS THE DISTURBANCE TERM
• WE ESTIMATE BY THE METHOD OF MAXIMUM LIKELIHOOD
• FITTED MODEL :
ZERO-INFLATED POISSON DISTRIBUTION

•  ZEROES OCCUR DUE TO TWO INDEPENDENT PROCESSES


• BY A BINARY PROCESS, WHEN A GROUP DO NOT FISH AT ALL [probability of
non-fishing being ]
• BY A POISSON PROCESS, WHEN A GROUP FAILS TO CATCH FISHES [with
parameter ]
• P. M. F. OF DISTRIBUTION -
ZERO-INFLATED POISSON MODEL

• M.
  L. E. OF IS OBTAINED AS 7.628 BY SOLVING SCORE EQUATION
• M. L. E. OF IS OBTAINED AS 0.568 FROM THE EQUATION
• BASED ON THESE ESTIMATES, WE PROCEED TO FIT A REGRESSION MODEL
• WITH LOGIT LINK FOR BINARY PROCESS AND LOGARITHMIC LINK FOR COUNT PROCESS
• FITTED MODEL
COMPARISON OF THE TWO REGRESSION
MODELS
• LOG-LIKELIHOOD OF NEGATIVE BINOMIAL MODEL COMES OUT TO BE
(-)405.222
• AND THAT OF ZERO-INFLATED POISSON MODEL IS (-)752.132
• THUS A. I. C. OF THE FORMER MODEL IS APPROXIMATELY 820
• AND THAT OF THE LATTER BECOMES MORE OR LESS 1521
• NOTE THAT, INCLUSION OF ZERO-INFLATION WORSENS EFFICACY OF THE
REGRESSION MODEL
• THUS WE CHOOSE NEGATIVE BINOMIAL AS A PREFERABLE MODEL OVER THE
ALTERNATIVE
INSIGHT INTO THE FIT
SIGNIFICANCE OF THE COEFFICIENTS

• TO
  TEST WHETHER ALL THE FACTORS AFFECT THE RESPONSE, WE WISH TO
TEST
• APPROPRIATE TEST STATISTIC FOR EACH TEST ARE THE STANDARDIZED
ESTIMATES OF THAT PARAMETER
• WE OBTAIN THE VALUES OF THE FOUR TEST STATISTICS AS 9.273, (-)9.623,
(-)4.918 AND (-)3.362
• AND THE CRITICAL POINT IS OBTAINED AS 1.960, AT 5% LEVEL OF
SIGNIFICANCE
• HENCE ALL THE COEFFICIENTS CAN BE CONSIDERED TO BE SIGNIFICANTLY
NONZERO
INTERPRETATION OF THE COEFFICIENTS

• TO
  UNDERSTAND HOW THE NUMBER OF FISHES CAUGHT IS AFFECTED BY
THE GROUP COMPOSITION, WE NEED TO UNDERSTAND THE IMPLICATIONS OF
THE VALUES OF THE ESTIMATES OF THE COEFFICIENTS
• AND CAN BE INTERPRETED AS THE INCREMENT IN THE LOGARITHM OF THE
AVERAGE NUMBER OF FISHES CAUGHT DUE TO UNIT INCREASE IN NUMBER
OF PERSONS AND CHILDREN IN A GROUP, WHEN OTHERS ARE HELD FIXED
• THUS PRESENCE OF AN ADDITIONAL PERSON IN THE GROUP WILL INCREASE
EXPECTED RESPONSE BY 189%, WHEREAS AN ADDITIONAL CHILD WILL
DECREASE IT BY 83%.
… CONTINUED

•  SINCE AND ARE RELATED, WE CANNOT INTERPRET AND SEPERATELY


• CAN BE THOUGHT OF AS THE PROPORTIONAL INCREASE IN AVERAGE
NUMBER OF FISHES CAUGHT FROM A GROUP WITHOUT ANY CAMPER TO A
GROUP OF IDENTICAL COMPOSITION WITH A CAMPER
• THUS WE CAN CONCLUDE THAT BRINGING CAMPER WILL INCREASE
EXPECTED NUMBER OF FISHES CAUGHT BY 85%
CONCLUSION

• COMBINING ALL OUR RESULTS, WE CAN CONCLUDE THAT A LARGER GROUP


WILL CATCH MORE FISHES, IF THERE ARE LESS NUMBER OR CHILDREN AND
THEY HAVE BROUGHT A CAMPER
• IF ONE WOULD HAVE APPLIED O. L. S. METHOD, THE FITTED MODEL WOULD
HAVE BEEN SOMETHING DIFFERENT AND WOULD LEAD TO
MISINTERPRETATIONS
• THUS WE SHOULD USE COUNT REGRESSION MODEL IN CASE OF DISCRETE
DATA, INSTEAD OF BLINDLY APPLYING THE METHOD OF LEAST SQUARES

You might also like