Regression model

Binary logit/probit model

P(y=1)=P(βx+ϵ>0)=P(ϵ>βx)=1F(βx)=F(βx)\begin{aligned} P(y=1) &= P(\sum \beta x + \epsilon > 0) \\ &= P(\epsilon > -\sum \beta x) \\ &= 1 - F(-\sum \beta x) \\ &= F(\sum \beta x) \end{aligned}

logit CDF[1]: F(x)=ex1+exF(x)=\frac{e^x}{1+e^x}
probit CDF: F(x)=ϕ(x)=x12πez22dzF(x)=\phi(x)=\int ^{x} _{-\infty}{ \frac{1}{ \sqrt{2 \pi} } e^{\frac{-z^2}{2}}}dz

marginal effect[2]


odds is P1P=eβx\frac{P}{1-P}=e^{\sum \beta x}
odds ratio is odds(a+1)odds(a)=eβx\frac{odds(a+1)}{odds(a)}=e^{\beta x}


Φ1(p)=βx\Phi^{-1}(p) = \sum \beta x
Φa=11Φa=01=βa\Phi^{-1}_{a=1} - \Phi^{-1}_{a=0} = \beta a

Order logit/probit model

Dependant variables are ordered values((ex) worst worse normal good very good)

P(y=1)=P(y0)=P(εβx)=F(βx)P(y=2)=P(0yμ2)=P(βx<ε<μ2βx)=F(μ2βx)F(βx)\begin{aligned} P(y=1) &= P(y^* \leq 0) = P(\varepsilon \leq -\sum \beta x) = F( -\sum \beta x) \\ P(y=2) &= P(0 \leq y \leq \mu_2) = P(-\sum \beta x < \varepsilon < \mu_2 - \sum \beta x) = F (\mu_2 - \sum \beta x) - F(-\sum \beta x) \end{aligned}

marginal effect


odds is P(y=1)P(y1)\frac{P(y=1)}{P(y \neq 1)} odds ratio is odds(a=1)odds(a=0)=eβa\frac{odds(a=1)}{odds(a=0)}=e^{\beta a}


Φ1(p(y=1))=z\Phi^{-1}(p(y=1)) = z
Φa=11Φa=01=βa\Phi^{-1}_{a=1} - \Phi^{-1}_{a=0} = \beta a

boundary value


odds is P(yi)P(y>i)\frac{P(y\leq i)}{P(y > i)} odds ratio is odds(a=1)odds(a=0)=eμiβa\frac{odds(a=1)}{odds(a=0)}=e^{\mu i - \beta a}


Φ1(P(yi))Φ1(P(yi1))=μi\Phi^{-1}(P(y \leq i)) - \Phi^{-1}(P( y \leq i-1)) = \mu i

Multinominal logit model

In case of the number of independent variables is greater than 3.

pjpj+pJ=F(kβjkxk)\frac{p_j}{p_j + p_J} = F(\sum_{k} \beta_{jk} x_k)

where p_J is a reference variable and pj=j variablepJ+j variablep_j = \frac{\text{j variable}}{p_J + \text{j variable}}

pjpJ=eβjkxk\frac{p_j}{p_J}=e^{\sum \beta_{jk} x_k}

jpjpJ=1pjpJ=1pJ1=ieβjkxk\sum_j \frac{p_j}{p_J} = \frac{1-p_j}{p_J}=\frac{1}{p_J}-1=\sum_i e^{\beta_{jk}x_k}

pJ=11+jeβjkxkp_J=\frac{1}{1 + \sum_j e^{\sum \beta_{jk} x_k}}

pj=eβjkxk1+jeβjkxkp_j=\frac{e^{\sum \beta_{jk} x_k}}{1 + \sum_j e^{\sum \beta_{jk} x_k}}

Nested logit/probit model

independent variable is hierarchy structure

graph TD; A[buy car] --> B[used] A --> B'[new] C[no buy car] --> D[used] C --> D'[new]
graph LR; A[F] --> A1[F_1] A --> A2[1-F_1] A1 --> B1[F_k12] A1 --> B2[1-F_k12] A2 --> B3[f_k22] A2 --> B4[1-f_k22]

marginal effect


odds is Fki1Fki\frac{F_{ki}}{1-F_{ki}}
odds ratio is odds(a=1)odds(a=0=βa\frac{odds(a=1)}{odds(a=0}=\beta a


Φ1(P(y=1))=βx\Phi^{-1}(P(y = 1)) = \sum \beta x
Φa=11Φa=01=βa\Phi^{-1}_{a=1} - \Phi^{-1}_{a=0} = \beta a

Conditional model

In case independent variables is changed by depedent variables ex: independent variables -> cost, time by seoul, gyunggi
dependent variables -> bus train car


eβae^{\beta a}: when a variable increase 1, the increase ratio reference variable(car) to comparable variable(bus)

marginal effect

Pjzj=zjkeazeaz(Q=eaz)=αeeazQαe2eazQ2=αeαzQ(QeeαzQ)=αkPj(1Pj)(P=eαzQ)\begin{aligned}\frac{\partial P_{j^*}}{\partial z_{j}}&=\frac{\partial}{\partial z_{jk} } \frac{e^{\sum a z}}{\sum e^{az}} \quad (Q=\sum e^{az})\\ &=\frac{\alpha e^{\sum e^{az}} Q - \alpha e^{2 \sum e^{az}} }{Q^2} \\ &=\frac{ \alpha e^{\sum \alpha z}}{Q} (\frac{Q - e^{e^{\sum \alpha z}}}{Q}) \\ &= \alpha_k P_j(1- P_j) \quad (P =\frac{ e^{\sum \alpha z}}{Q} ) \end{aligned}

Ranked logit model

In case of ranked dependent variables

Marginal effect

eβa=λλ0λ: Hazard functione^{\beta a} = \frac{\lambda}{\lambda_0} \quad \lambda \text{: Hazard function}


P(ur1>ur2)=j=1J1eVjeVk\displaystyle P(u_{r_1} > u_{r_2} \ldots)=\prod_{j=1}^{J-1} \frac{ e^{V_j}}{\sum e^{V_k}}
where Vj=βjxi+αzj+θwijV_j = \beta_j x_i + \alpha z_j + \theta w_{ij}


exp(iαβixi+αzj)\exp(\sum_{i \neq \alpha} \beta_i x_i + \alpha \sum z_j) where α\alpha is means, zjz_j

Example probability

the probability of a(1st) b(sec) c(thd):
abc=exp(P(a))exp(a)+exp(b)+exp(c)exp(P(b))exp(b)+exp(c)exp(P(c))exp(c)a \cdotp b \cdotp c =\frac{\exp(P(a))}{\exp(a)+\exp(b)+\exp(c)} \cdotp \frac{\exp(P(b))}{\exp(b)+\exp(c)}\cdotp \frac{\exp(P(c))}{\exp(c)}

  1. cumulative distribution function ↩︎

  2. the ratio for probability to independent variable derivative ↩︎