python - GenericLikelihoodModel to fit binomial distribution: output stderr is nan -
since there no direct discrete distribution fitting packages in python, tried use statsmodels.base.model.genericlikelihoodmodel fit binomial distribution. however, in cases, e.g., n=5, p=0.5, every term in estimation summary nan except point estimations.
if change n 5 10, works well.
my codes are:
import numpy np statsmodels.base.model import genericlikelihoodmodel scipy.stats import binom class binom(genericlikelihoodmodel): def loglike(self, params): return np.log(binom.pmf(self.endog, *params)).sum() n, p = 5, 0.5 params = (n, p) x = binom.rvs(5, p, size=1000, random_state=1) res = binom(x).fit(start_params=params) res.df_model = len(params) res.df_resid = len(x) - len(params) print(res.summary())
related output:
coef std err z p>|z| [95.0% conf. int.] par0 5.0000 nan nan nan nan nan par1 0.4972 nan nan nan nan nan
error messages:
//anaconda/lib/python3.6/site-packages/ipykernel/main.py:23: runtimewarning: divide 0 encountered in log
//anaconda/lib/python3.6/site-packages/statsmodels/tools/numdiff.py:329: runtimewarning: invalid value encountered in double_scalars - f(*((x - ee[i,:] - ee[j,:],)+args), **kwargs),)
//anaconda/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:875: runtimewarning: invalid value encountered in greater return (self.a < x) & (x < self.b)
//anaconda/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:875: runtimewarning: invalid value encountered in less return (self.a < x) & (x < self.b)
//anaconda/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:1814: runtimewarning: invalid value encountered in less_equal cond2 = cond0 & (x <= self.a)
i found reason why codes failed. in statsmodels, calculation of std err formula
np.sqrt(np.diag(self.cov_params()))
where cov_params hessian matrix @ points estimations. in fitting process, hessian matrix calculated by
hess[i, j] = (f(*((x + ee[i, :] + ee[j, :],) + args), **kwargs) - f(*((x + ee[i, :] - ee[j, :],) + args), **kwargs) - (f(*((x - ee[i, :] + ee[j, :],) + args), **kwargs) - f(*((x - ee[i, :] - ee[j, :],) + args), **kwargs),) )/(4.*hess[i, j])
where f loglike function, x point estimations , ee estimated errors.
combined customized loglike function, can see involves calculation of log(0)=inf (explained below) , hess[i,j] involves addition of log(0)(leading nan). that's why final std err , other metrics become nan.
the 0 in log(0) obtained through binom.pmf(x, n, p)=0 x>n. in practice, sample size 1000. given binom.pmf(5,5,0.5)=0.03125 , binom.pmf(10,10,0.5)=0.00097, can see when choose n=10,p=0.5 sampling, there may exist no 10 in sample. hence don't encounter log(0) cases. however, if increase sample size 1000000, sample error occur, expected.
so question becomes there way, either modifying customized loglike function or overwriting hessian functions in statsmodels, calculate std err in case?
in general, there theoretical , computational problems maximum likelihood estimation when support of distribution depends on parameter.
in case n defines upper bound of distribution, , far remember inference statistics standard errors not valid in case.
in usual binomial case assume number of trials n known, , estimate probability or proportion. in statsmodels , other packages can estimated using glm family binomial , specifying number of successes , failures dependent variable. default logit link used requires parameter transformed obtain probability done predict method in case.
Comments
Post a Comment