Advanced Econometrics - Part II - Chapter 6: Models for count data

Tài liệu Advanced Econometrics - Part II - Chapter 6: Models for count data: Advanced Econometrics - Part II Chapter 6: Models for count data Nam T. Hoang UNE Business School 1 University of New England Chapter 6 MODELS FOR COUNT DATA A count variable is a variable that takes on non-negative integer values: • There is no natural upper bound • The outcome will be zero for at least some members of the population Y is count variable, X is a vector of explanatory variables. It is better to model )( XYE directly and to choose functional forms that ensure possibility for any value of X and any parameter value. When Y has no upper bound, the most popular of these is the exponential function )exp()( βXXYE = I. POISSON REGRESSION MODEL: • The basic Poisson regression model assumes that Y given ),...,,( 21 kXXXX = has a Poisson distribution. • The Poisson regression model specifies that each iY is drawn from a Poisson distribution with parameter iλ , which is related to the regressor iX . ! )(Pr i Y i ii Y eXYYob iiλλ− == ( ! 1 ...

7 trang | Chia sẻ: honghanh66 | Lượt xem: 589 | Lượt tải: 0

Bạn đang xem nội dung tài liệu Advanced Econometrics - Part II - Chapter 6: Models for count data, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên

Advanced Econometrics - Part II Chapter 6: Models for count data Nam T. Hoang UNE Business School 1 University of New England Chapter 6 MODELS FOR COUNT DATA A count variable is a variable that takes on non-negative integer values: • There is no natural upper bound • The outcome will be zero for at least some members of the population Y is count variable, X is a vector of explanatory variables. It is better to model )( XYE directly and to choose functional forms that ensure possibility for any value of X and any parameter value. When Y has no upper bound, the most popular of these is the exponential function )exp()( βXXYE = I. POISSON REGRESSION MODEL: • The basic Poisson regression model assumes that Y given ),...,,( 21 kXXXX = has a Poisson distribution. • The Poisson regression model specifies that each iY is drawn from a Poisson distribution with parameter iλ , which is related to the regressor iX . ! )(Pr i Y i ii Y eXYYob iiλλ− == ( ! 1 2 ... )i iY Y= × × × iλ and iX are related as: βλ ii X=ln or βλ iXi e= The expected number of events is given by: βλ iXiiiii eXYVarXYE === ][][ (Poisson distribution properties) So: βλi i ii X XYE = ∂ ∂ ][ • With the parameter estimate in hand, this vector can be computed using any data vector desired. • In principle, the Poisson model is simply a non-linear regression, but it is easier to estimate the parameters with maximum likelihood techniques. The log-likelihood function is: Advanced Econometrics - Part II Chapter 6: Models for count data Nam T. Hoang UNE Business School 2 University of New England i n n X i i 1 i 1 L ln [- ( ) ln( !)] [- ( ) ln( !)]i i i i iY X Y e Y X Y βλ β β = = = = + − = + −∑ ∑ The likelihood equations are: ∑ = =−= ∂ ∂ n i iii XY L 1 0)(ln λ β ∑ = =−= n i i X i XeY i 1 0)( β The Hessian is: ∑ = −= ∂∂ ∂ n i iii XX L 1 ' ' 2 ln λ ββ The Hessian is negative definite for all X and β . Newton-Raphson method is a simple algorithm for this model and will converge rapidly. At convergence      ∑ = n i iii XX 1 'λˆ is an estimator of the asymptotic covariance matrix for β . )ˆexp(ˆ βλ ii X= . iλˆ is the prediction for observation )ˆexp(ˆ βλ ii Xi =→ estimated variance of iλˆ will be iii VXX '2ˆλ , where V is the estimated asymptotic covariance matrix for βˆ , 1 1 'ˆ − =       = ∑ n i iii XXV λ II. GOODNESS OF FIT: 2 1 2 1 2 ˆ ˆ 1 ∑ ∑ = =       −         − −= n i i n i i ii Y YY Y R λ λ This measure compares the fit of the model with a model of only one constant term. Note: iY is integer, the prediction βλ ˆˆ iXi e= is continuous. III. OVERDISPERSION: Poisson model has been criticized because of its implicit assumption that the variance of iY equals it's mean. Many extensions of Poisson model that relax this assumption have been proposed. Test for over dispersion: Advanced Econometrics - Part II Chapter 6: Models for count data Nam T. Hoang UNE Business School 3 University of New England ][][: iio YEYVarH = ])[(][][: iiiA YEgYEYVarH α+= Regress: 2ˆ )ˆ( 2 i iii i YYZ λ λ −− = Most of count model with overdispersion (variance exceeds the mean) specify overdispersion to be the form: ])[(][][ iiii YEgYEXYVar α+= Where α is unknown parameter, g(.) is a known function most commonly ,)( 2µµ =g or µµ =)(g . Test: 0: =αoH 0: ≠αAH (or )0>α Can be carried out by running the regression: i i i i iii ugYY +=−− λ λα λ λ ˆ )ˆ( ˆ )ˆ( 2 Where iu is an error term. The reported t-statistic for α is asymptotically normal under 0: =αoH (Cameron & Trivedi 1990). This test can be also used for underdispersion 0<α , in which case the conditional variance is less than the conditional mean. Conditional mean & variance iµ of the Poisson distribution, suppose now that the parameter is random rather than being a completely deterministic function of regressor iX . Let: iii uλµ = iiiii Xu εβλµ +=+=→ lnlnln βiX iε  This distribution of iY conditioned on iX and )( iiu ε remain Poisson with conditional mean and variance iµ : ! )(),( i Y ii u iii Y ueuXYf iii λλ− = Prob( , )i i iY Y X u= = ∫ ∞ − ==→ 0 )( ! )()(Pr ii i Y ii u ii duugY ueXYYob iii λλ Advanced Econometrics - Part II Chapter 6: Models for count data Nam T. Hoang UNE Business School 4 University of New England )( iug is density function of iu The choice of )( iug defines the unconditional distribution. For mathematical convenience, a gamma distribution is usually assumed for )( iiu εε= . Assume 1)( =iuE (for iiiE λλµ =)( ). 1 )( )( −− Γ =→ θθ θ θ θ i u i ueug i This density function for iY is then 1 0 ( ) Prob( ) ( ) ! ( ) i i i iu Y u i i i i i i i i i e u u e Y Y X f Y X du Y λ θθ θλ θ θ ∞ − −− = = = Γ∫ θ θ θ )1( )()1( )( i Y i i rr Y Y i − Γ+Γ +Γ where θλ λ + = i i ir IV. NEGATIVE BINOMIAL REGRESSION MODEL: • The assumed equality of the conditional mean and variance is the major shortenings of the Poisson model. • We generalize the Poisson model by introducing an individual unobserved effect into the conditional mean. • Suppose now that the conditional mean & variance iµ of the Poisson distribution is random rather than being completely deterministic function of X (Because of unobserved heterogeneity different obs may have different iµ . iλ is an parameter of Poisson but part of this difference is due to a random (unobserved) component iu not only because of iX iµ is just a parameter of distribution we want βµ iXi eE =)( we don’t want βµ iXi e= Let: iii uλµ = Where βλ iXi e= iiiii Xu εβλµ +=+=→ lnlnln The disturbance iε reflects cross-sectional heterogeneity that normally characterizes micro-economic data.  The distribution of iY conditional on iX and iu remain Poisson with conditional mean & variance iµ : ( ) Prob( , ) ! i i iu Y i i i i i i e u Y Y X u Y λ λ− = = ),( iii uXYf= Advanced Econometrics - Part II Chapter 6: Models for count data Nam T. Hoang UNE Business School 5 University of New England The unconditional distribution )(Pr ii XYYob = is the expected value over iu of ),( iii uXYf . 0 ( ) Prob( ) ( ) ! i i iu Y i i i i i i i e u Y Y X g u du Y λ λ∞ − = = ∫ )( iug is a density function of iu  problem: the choice of )( iug ? For mathematical convenience, a gamma distribution is usually assumed for iu . Assume 1)( =iuE (for iiiE λλµ =)( ). 1 )( )( −− Γ =→ θθ θ θ θ i u i ueug i Then: 1 0 ( ) Prob( ) ! ( ) i i i iu Y u i i i i i i i e u u e Y Y X du Y λ θθ θλ θ θ ∞ − −− = = Γ∫ i Y i u i Y i duue Y iii i 1 0 )( )()1( −+ ∞ +−∫Γ+Γ= θθλ θ θ λθ i i Y ii i Y i Y Y ++Γ+Γ +Γ = θ θ θλθ θλθ ))(()1( )( ii i Y i i i rr Y Y θ θ θ )1( )()1( )( − Γ+Γ +Γ = where θλ λ + = i i ir This is the form of the negative binomial distribution  the distribution has conditional mean iλ and conditional variance            + ii λθ λ 11 [ ] βλ iXiii eXYE ==→ [ ]            +== iiiii XYVar λθ λλ 11 Note: gamma function: 1 0 ( ) P tP t e dt ∞ − −Γ = ∫ We have: )1()1()( −Γ−=Γ PPP )!1()( −=Γ→ PP if P is a integer number gamma function is a generalization of the factor function for non-integer values. Advanced Econometrics - Part II Chapter 6: Models for count data Nam T. Hoang UNE Business School 6 University of New England  which is the form of the negative binomial distribution. ( ) iii XYE λ= ( )            += iiii XYVar λθ λ 11 Note: gamma distribution: 1 )( )( −− Γ = Px P xe P xf λλ If PxE =→= λ1)( 0,0,0 >>≥ Px λ  because λ PxE =)( 2)( λ PxV = iiiE λλµ =)( if 1)( =iuE  the interpretation as in the Poisson model iii XYE λ=)( )11()( iiii XYV λθ λ      += This negative binomial model can be estimated by maximum likelihood without much difficulty. A test of a the Poisson distribution is often carried out of testing the hypothesis 01 == θα using the Wald test. βλ iXiii eXYE ==)(            += iiii XYVar λθ λ 11)( The ratio of the variance to the mean now is 11 >      + θ λi , different for different observations. The log-likelihood: 1 ln ln ( ) ln ( 1) ln ( ) ln ln(1 ) n i i i i i i L Y Y Y r rθ θ θ = = = Γ + − Γ + − Γ + + −∑ θλ λ + = i i ir ; βλ iXi e=  can be estimated by MLE easily. Application: ),,,,,1( InsuranceKidsIncomeEducationAgeX ti = Doctor visits: count data models Advanced Econometrics - Part II Chapter 6: Models for count data Nam T. Hoang UNE Business School 7 University of New England V. TOO MANY ZEROS DATA: In many data sets, there is large number of zero counts. Assuming Poisson or negative binomial is then a misspecification. Alternative is the zero-Inflated Poisson model. • A binary probability model determines whether a zero or a nonzero outcome occurs then. • A truncated Poisson distribution describes the positive outcomes. Prob( 0 )i iY X e θ−= = (1 ) Prob( ) !(1 ) i i j i i i e e Y j X j e λθ λ λ−− − − = = − Prob( 1 ) ( , )i i iZ W F W γ= = Prob( , 1) ! i j i i i i e Y j X Z j λ λ− = = = =>×−+×= 0,[)1(0)( ** iiii YXYEFFXYE ie F i λ λ −− −= 1 )1( Where *Y denote the outcome of the Poisson process in the regime 2. Prob( 0 ) Prob( 1) Prob( 0 , 2)*Prob( 2)i i iY X regime Y X regime regime= = + = Prob( ) Prob( , 2)*Prob( 2)i i i iY j X Y j X regime regime= = =

Các file đính kèm theo tài liệu này:

chapter_06_models_for_count_data_2747_3207.pdf