ordered probit modell和logistic model可用于时间序列吗

probit模型_文档库
文档库最新最全的文档下载
当前位置: & probit模型
probit模型
logistic 回归是直接估计概率,而 logit 模型对概率做了 Logit 转换。不过, SPSS 软件好像将以 分类自变量构成的模型称为 Logit 模型, 而将既有分类自变量又有连续自变量的模型称为 Logistic 回归模型。至于是二元还是多元,关键是看因变量类别的多少,多元是二元的扩展。
其次,当因变量是名义变量时, Logit 和 Probit 没有本质的区别,一般情况下可以换用。区别在 于采用的分布函数不同, 前者假设随机变量服从逻辑概率分布, 而后者假设随机变量服从正态分 布。其实,这两种分布函数的公式很相似,函数值相差也并不大, 唯一的区别在于逻辑概率分布 函数的尾巴比正态分布粗一些。 但是, 如果因变量是序次变量, 回归时只能用有序 Probit 模型。 有序 Probit 可以看作是 Probit 的扩展
首先, 通常人们将 “Logistic 回归 ” 、 “Logistic 模型 ” 、 “Logistic 回归模型 ” 及 “Logit 模型 ” 的称谓相互 通用,来指同一个模型,唯一的区别是形式有所不同:logistic 回归是直接估计概率,而 logit 模 型对概率做了 Logit 转换。不过, SPSS 软件好像将以分类自变量构成的模型称为 Logit 模型, 而将既有分类自变量又有连续自变量的模型称为 Logistic 回归模型。 至于是二元还是多元, 关键 是看因变量类别的多少,多元是二元的扩展。
其次,当因变量是名义变量时, Logit 和 Probit 没有本质的区别,一般情况下可以换用。区别在 于采用的分布函数不同, 前者假设随机变量服从逻辑概率分布, 而后者假设随机变量服从正态分 布。其实,这两种分布函数的公式很相似,函数值相差也并不大, 唯一的区别在于逻辑概率分布 函数的尾巴比正态分布粗一些。 但是, 如果因变量是序次变量, 回归时只能用有序 Probit 模型。 有序 Probit 可以看作是 Probit 的扩展
probit 和 logit model的都是给 discrete variables用的, 他们的区别在于 probit 的 error 用的是 normal distribution的假设,而 logit 用的是 type I extreme value distribution的假设。 logit 应 用的比较多是因为这个假设得到的 probability 是 close from ,比较好处理,但是劣势是它具 有 irrelevance of independent alternatives的性质, 就是经典的 red bus/blue bus的问题, probit 不具有 IIA 的问题,但是没有了 close form, probit 另外的一个优势是可以比较清楚的 model shock 上的 correlation. 比如当两个 discrete choice 其实有同一个来源的 shock 的时候(比如 原材料相关什么的啊) ,这个时候就要在 model error structure 体现出来, probit 的 normal distribution 的 error term就比较容易做到这一点, 只要多估计一个 co-variance 的参数就可以 了 ~
Word文档免费下载:
SPSS数据分析—Probit回归模型 - Probit 含义为概率单位, 和 Logistic 回归一样,Probit 回归也用于因变量为分 类变量的情况,通常情况下,两种回归方法的结果非常接....Probit回归模型 - Probit 模型也是一种广义的线性模型,当因变量为分类变量时,有四种常用的 分析模型: 1.线性概率模型(LPM) 2.Logistic 模型 3.Probit 模型...probit模型与logit模型_物理_自然科学_专业资料。probit 模型与 logit 模型
16:10:17 probit 模型是一种广义的线性模型。服从正态分布。 最简单的 ...spss处理probit模型_计算机软件及应用_IT/计算机_专业资料 暂无评价|0人阅读|0次下载|举报文档spss处理probit模型_计算机软件及应用_IT/计算机_专业资料。spss处理...有序probit模型的基本原理_物理_自然科学_专业资料。市场微观结构 Market Microstructure 曾志钊
讲稿结构 ? ? ? ? 第一部分 第二部分 第三部分 第...金融计量经济第五讲虚拟变量模型和Probit、Logit模型 第一节 虚拟变量的一般应用一、虚拟变量及其作用 1.定义:取值为0和1的人工变量,表示非量化 (定性)因素对模型...有序probit模型的基本原理 - 市场微观结构 Market Microstructure 曾志钊
讲稿结构 ? ? ? ? 第一部分 第二部分 第三部分 第四部分 ....比较logit 模型和probit 模型_经济学_高等教育_教育专区。比较logit和probitEuropean Journal of Scientific Research ISSN X Vol.27 No.4 (2009), pp.54...Logit and Probit ModelsChapter5Discrete Dependent
Logit, Nested Logit, and Probit models are used to model a relationship between a dependent
Y and one or more
X. The dependent variable, Y, is a
that represents a choice, or category, from a set of mutually exclusive choices or categories. For instance, an analyst may wish to
the choice of automobile purchase (from a set of vehicle classes), the choice of travel mode (walk, transit, rail, auto, etc.), the manner of an automobile collision (rollover, rear-end, sideswipe, etc.), or residential location choice (high-density, suburban, exurban, etc.). The independent variables are presumed to affect the choice or category or the choice maker, and represent a priori beliefs about the causal or associative elements important in the choice or classification process.& In the case of
variables, an ordered logit or probit
can be applied to take advantage of the additional information provided by the ordinal over the nominal scale (not discussed here). Examples: An analyst wants to model:1.The effect of household member characteristics,
network characteristics, and alternative
characteristics on choice of bus, walk, auto, carpool, single occupant auto, rail, or bicycle.2.The effect of consumer characteristics on choice of vehicle purchase: sport utility vehicle, van, auto, light pickup truck, or motorcycle.3.The effect of traveler characteristics and employment characteristics on ai Delta, United Airlines, Southwest, etc.4.The effect of involved vehicle types, pre-crash conditions, and environmental factors on vehicle crash outcome: property damage only, mild injury, severe injury, fatality.& 1)The observations on dependent
Y are assumed to have been randomly sampled from the
of interest (even for stratified samples or choice-based samples). 2)Y is caused by or associated with the X’s, and the X’s are determined by influences (variables) ‘outside’ of the model.3)There is uncertainty in the relation between Y and the X’s, as reflected by a scattering of observations around the functional relationship.4)The
terms must be assessed to determine if a selected model is appropriate.Discrete
Y is the observed choice or classification, such as brand selection, transportation
selection, etc. For grouped data, where choices are observed for homogenous experimental units or observed multiple times per experimental unit, the dependent variable is proportion of choices observed.One or more continuous and/or discrete variables X, which describe the attributes of the choice maker or event and/or various attributes of the choices thought to be causal or influential in the decision or classification process.&Functional form of relation between Y and X’s.&Strength of
between Y and X’s (individual X’s and collective set of X’s).&Proportion of choice or classification uncertainty explained by hypothesized relation.& &Confidence in predictions of future/other observations on Y given X.&& &PavementsKoehne, Jodi, Fred Mannering, and Mark Hallenbeck (1996). Analysis of Trucker and Motorist Opinions Toward Truck-lane Restrictions.
Research Record #1560 pp. 73-82. National Academy of Sciences.TrafficMannering, Fred, Jodi Koehne and Soon-Gwan Kim. (1995). Statistical Assesssment of Public
Toward Conversion of General-Purpose Lanes to High-Occupancy Vehicle Lanes.
Research Record #1485 pp. 168-176. National Academy of Sciences.PlanningKoppelman, Frank S., and Chieh-Hua Wen (1998). Nested Logit Models: Which Are You Using?
Research Record #1645 pp. 1-9. National Academy of Sciences.Yai, Tetsuo, and Tetsuo Shimizu (1998). Multinomial Probit with Structured
for Choice Situations with Similar Alternatives.
Research Record #1645 pp. 69-75. National Academy of Sciences.McFadden, Daniel.& Modeling the Choice of Residential Location. (1978).
Research Record #673 pp. 72-77. National Academy of Sciences.Horowitz, Joel L. (1984) Testing Disaggregate Travel Demand Models by Comparing Predicted and Observed Market Shares.
Research Record #976 pp. 1-7. National Academy of Sciences.How is a choice
equation interpreted? How do continuous and
differ in the choice model?How are
interpreted?How is the Likelihood Ratio Test interpreted?How are t-statistics interpreted?How are phi and adjusted phi interpreted?How are confidence intervals interpreted?How are
interpreted?How are elasticities computed and interpreted?When is the independence of irrelevant alternatives (IIA) assumption violated?
terms be included in the model?How many variables should be included in the model?What methods can be used to specify the relation between choice and the X’s?What methods are available for fixing heteroscedastic errors?What methods are used for fixing serially correlated errors?What can be done to deal with multi-collinearity?What is endogeneity and how can it be fixed?How does one know if the errors are Gumbel distributed?
&Ben Akiva, Moshe and Steven R. Lerman. Discrete Choice Analysis: Theory and Application to Predict Travel Demand. The MIT Press, Cambridge MA. 1985.&Greene, William H. Econometric Analysis. MacMillan Publishing Company, New York, New York. 1990.&Ortuzar, J. de D. and L. G. Willumsen. Modelling Transport. Second Edition. John Wiley and Sons, New York, New York. 1994. &Train, Kenneth. Qualitative Choice Analysis: Theory, Econometrics, and an Application to Automobile Demand. The MIT Press, Cambridge MA. 1993.Postulate mathematical models from theory and past research.Discrete choice models (logit, nested logit, and probit) are used to develop models of behavioral choice or of event classification. It is accepted a priori that the analyst doesn’t know the complexity of the underlying relationships, and that any
of reality will be wrong to some degree. Choice models estimated will reflect the a priori assumptions of the modeler as to what factors affect the decision process. Common applications of discrete choice models include choice of transportation mode, choice of travel destination choice, and choice of vehicle purchase decisions. There are many potential applications of discrete choice models, including choice of residential location, choice of business location, and
project contractor selection.In order to postulate meaningful choice models, the modeler should review past literature regarding the choice context and identify factors with potential to affect the decision making process. These factors should drive the data-collection process—usually a survey instrument given to experimental units, to collect the information relevant in the decision making process. There is much written about survey design and
collection, and these sources should be consulted for detailed discussions of this complex and critical aspect of choice modelingTransportation Planning Example: An analyst is interested in modeling the mode choice decision made by individuals in a region. The analyst reviews the literature and develops the following list of potential factors influencing the
choice decision for most travelers in the region.1.Trip maker characteristics (within the household context):Vehicle availability, possession of driver’s license, household structure (stage of life-cycle), role in household, household income (value of time)2.Characteristics of the journey or activity:Journey work, grocery shopping, school, etc.,
time of day, accessibility and proximity of activity destination3.Characteristics of transport facility:Qualitative F comfort and convenience, reliability and regularity, protection, securityQuantitative F in-vehicle travel times, waiting and walking times, out-of-pocket monetary costs, availability and cost of parking, proximity/accessibility of transport
&Estimate choice modelsQualitative choice analysis methods are used to describe and/or predict discrete choices of decision-makers or to classify a discrete outcome according to a host of regressors.& The need to
choice and/or classification arises in transportation, energy, marketing, telecommunications, and housing, to name but a few fields.& There are, as always, a set of assumptions or requirements about the
that need to be satisfied. The response variable (choice or classification) must meet the following three criteria.&&&&&&&&&& 1.The set of choices or classifications must be finite.&&&&&&&&&& &2.The set of choices or classifications must b that is, a particular outcome can only be represented by one choice or classification. &3.The set of choices or classifications must be collectively exhaustive, that is all choices or classifications must be represented by the choice set or classification.&Even when the 2nd and 3rd criteria are not met, the analyst can usually re-define the set of alternatives or classifications so that the criteria are satisfied.& Planning Example:& An analyst wishing to
mode choice for commute decisions defines the choice set as AUTO, BUS, RAIL, WALK, and BIKE. The modeler observed a person in the database drove her personal vehicle to the transit station and then took a bus, violating the second criteria.& To remedy the modeling problem and similar problems that might arise, the analyst introduces some new choices (or classifications) into the modeling process: AUTO-BUS, AUTO-RAIL, WALK-BUS, WALK-RAIL, BIKE-BUS, BIKE-RAILE. By introducing these new categories the analyst has made the discrete choice
comply with the stated modeling requirements.&Deriving Choice Models from Random Utility Theory&&&&&&&&&&&&&&&&&&& Choice models are developed from economic theories of random utility, whereas classification models (classifying crash type, for example) are developed by minimizing classification errors with respect to the X’s and classification levels Y.& Because most of the literature in
is focused on choice models and because mathematically choice models and classification models are equivalent, the discussion here is based on choice models. Several assumptions are made when deriving discrete choice models from random utility theory:1.An individual is faced with a finite set of choices from which only one can be chosen. 2.Individuals belong to a homogenous population, act rationally, and possess perfect information and always select the option that maximizes their net personal utility.3.If C is defined as the universal choice set of discrete alternatives, and J the number of elements in C, then each member of the
has some subset of C as his or her choice set. Most decision-makers, however, have some subset Cn, that is considerably smaller than C.& It should be recognized that defining a subset Cn, that is the feasible choice set for an individual i however, it is assumed that it can be determined.4.Decision-makers are endowed with a subset of attributes xn &I X, all measured attributes relevant in the decision making process.&Planning Example: In identifying the choice set of travel
the analyst identifies the universal choice set C to consist of the following:1.&& driving alone2.&& sharing a ride3.&& taxi4.&& motorcycle5.&& bicycle6.&& walking7.&& transit bus8.light rail transit&The analyst identifies a family whose choice set is fairly restricted because the do not own a vehicle, and so their choice set Cn is given by:1.sharing a ride2.taxi3.bicycle4.walking5.transit bus6.light rail transit&The modeler, who is an OBSERVER of the system, does not possess complete information about all elements considered important in the decision making process by all individuals making a choice, so Utility is broken down into 2 components, V and e:Uin = (Vin + ein);&&& Uin is the overall utility of choice i for individual n,Vin is the systematic or measurably utility which is a function of xn and i for individual n and choice iein includes idiosyncrasies and taste variations, combined with measurement or observations errors made by modeler, and is the random utility component.&The
term allows for a couple of important cases:& 1) two persons with the same measured attributes and facing the same choice set make 2) some individuals do not select the best alternative (from the modelers point of view it demonstrated irrational behavior).The decision maker n chooses the alternative from which he derives the greatest utility.& In the binomial or two-alternative case, the decision-maker chooses alternative 1 if and only if:&&&&&&&&&&&&&&&&& U1n ³ U2nor when:&&&&&&&&&&&&&&&&& V1n + e1n ³ V2n + e2n.&In probabilistic terms, the probability that alternative 1 is chosen is given by:Pr (1) = Pr (U&1 ³ U2)&&&&&&&&&&&&&&&&& = Pr (V&1& + e1 ³ V&2& + e2)&&&&&&&&&&&&& &&&&&&&&&&&&&&&&&&&&& = Pr (e2& - e1 & V&1 - V&2).&&&&&&&&& &Note that this equation looks like a cumulative
for a probability density. That is, the probability of choosing alternative 1 (in the binomial case) is equal to the probability that the difference in random utility is less than or equal to the difference in deterministic utility. &
e1, which is the difference in unobserved utilities between alternatives 2 and 1 for travelers 1 through N (subscript not shown), then the probability
or density of
e, &(e), can be specified to form specific classes of models.
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
A couple of important observations about the probability density given by F (V1 - V2) can be made.1.The
e is small when there are large differences in systematic utility between alternatives one and two.2.Large errors are likely when differences in utility are small, thus decision makers are more likely to choose an alternative on the ‘wrong’ side of the indifference line (V1 - V2 = 0).Alternative 1 is chosen when V1 - V2 & 0 (or when e & 0), and alternative 2 is chosen when V1 - V2 & 0.&&Thus, for binomial models of discrete choice:&.&& &The cumulative
function, or
CDF, typically looks like:& V1 - V2 &&This structure for the
term is a general result for binomial choice models.&& By making assumptions about the probability density of the residuals, the modeler can choose between several different binomial choice
formulations.& Two types of binomial choice models are most common and found in practice: the logit and the probit models. The logit model assumes a logistic
of errors, and the probit model assumes a normal distributed errors. These models, however, are not practical for cases when there are more than two cases, and the probit
is not easy to estimate (mathematically) for more than 4 to 5 choices. Mathematical Estimation of Choice ModelsRecall that choice models involve a response Y with various levels (a set of choices or classification), and a set of X’s that reflect important attributes of the choice decision or classification. Usually the choice or classification of Y is a modeled as a linear function or combination of the X’s. Maximum likelihood methods are employed to solve for the betas in choice models. Consider the likelihood of a
of N independent observations with probabilities p1, p2,……,pn.& The likelihood of the
is simply the product of the individual likelihoods. The product is a maximum when the most likely set of p’s is used.& &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
& i.e. Likelihood L* = p1p2p3……pn = &For the binary choice model:L* = (b1, ……, bK) = &&where, Prn (i) is a function of the betas, and i and j are alternatives 1 and 2 respectively. It is generally mathematically simpler to analyze the logarithm of L*, rather than the
itself. Using the fact that ln (z1z2) = ln (z1) + ln (z2), ln (z)x = x ln (z), Pr (j)=1-Pr (i), and yjn = 1 – yin, the equation becomes: &The maximum of L is solved by differentiating the function with respect to each of the beta’s and setting the partial derivatives equal to zero, or the values of b1, ……, bK that provides the maximum of L .& In many cases the log
is globally concave, so that if a solution to the first order conditions exist, they are unique.& This does not always have to be the case, however.& Under general conditions the likelihood estimators can be shown to be consistent, asymptotically efficient, and asymptotically normal.In more complex and realistic models, the likelihood function is evaluated as before, but instead of estimating one parameter, there are many parameters associated with X’s that must be estimated, and there are as many equations as there are X’s to solve. In practice the probabilities that maximize the
are likely to be different across individuals (unlike the simplified example above where all individuals had the same probability). Because the likelihood function is between 0 and 1, the log likelihood function is negative. The maximum to the log-likelihood function, therefore, is the smallest negative value of the log
and specified probability functions.&Planning Example. Suppose 10 individuals making travel choices between auto (A) and transit (T) were observed. All travelers are assumed to possess identical attributes (a really poor assumption), and so the probabilities are not functions of betas but simply a function of p, the probability of choosing Auto.& The analyst also does not have any alternative specific attributes—a very naive
that doesn’t reflect reality. The likelihood function will be:L* = px (1-p)n-x = p7 (1-p)3& &&&&&&&&&&&&&&&&&&&&&&&&&&&&&& p = probability that a traveler chooses A, &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& 1-p = probability that a traveler chooses T,&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& n = number of travelers = 10&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& x = number of travelers choosing A.&&Recall that the analyst is trying to estimate p, the probability that a traveler chooses A.& If 7 travelers were observed taking A and 3 taking T, then it can be shown that the maximum likelihood estimate of p is 0.7, or in other words, the value of L* is maximized when p=0.7 and 1-p=0.3.& All other combinations of p and 1-p result in lower values of L*.& To see this, the analyst plots numerous values of L* for all integer values of P (T) from 0.0 to 10.0.& The following plot is obtained:&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&Similarly (and in practice), one could use the log
to derive the maximum likelihood estimates, where L = log (L*) = Log [p7 (1-p)3] = Log p7 + Log (1-p)3 = 7 Log p + 3 Log (1-p).&&&&&& && &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& LogLikehood Function&&&&&&&&&&&&&&&&&&&Note that in this simple
p is the only parameter being estimated, so maximizing the likelihood function L* or the log (L*) only requires one first order condition, the derivative of p with respect to log (L*).& &&&The Multinomial Logit ModelThe multinomial logit (MNL) model is the most commonly applied model to explain and forecast discrete choices due to its ease of estimation and foundation in utility theory. The MNL model is a general extension of the binomial choice
to more than two alternatives. The universal choice set is C, which contains j elements, and a subset of C for each individual Cn, defines their restricted choice sets. It should be noted that it is not a trivial task to define restricted choice sets for individuals. In most cases Jn for decision maker n is less than or equal to J, the total number of alternatives in the universal choice set, however it is often assumed that all decision makers face the same set of universal alternatives.Without showing the derivation, which can be found in the references for this chapter, the MNL
is expressed as:&W1.Utility for traveler n and
i = Uin = Vin + ein2.Pn (i) is the probability that traveler n chooses
i3.Numerator is utility for
i for traveler n, denominator is the sum of utilities for all alternative modes Cn&&&&&&&&&&&&& &&&for traveler n4.The disturbances ein are independently distributed5.The disturbances ein are identically distributed6.The disturbances are Gumbel distributed with location
h and a scale
m & 0.&The MNL
expresses the probability that a specific alternative is chosen is the exponent of the utility of the chosen alternative divided by the exponent of the sum of all alternatives (chosen and not chosen). The predicted probabilities are bounded by zero and one. There are several assumptions embedded in the estimation of MNL models. Linear in parameters restriction:The linear in parameters restriction is made for convenience of estimation, which enables simple and efficient estimation of parameters. When the functional form of the systematic component of the utility function is linear in parameters, the MNL
can be written as:&&&&&&&where xin and xjn are vectors describing the attributes of alternatives i and j as well as attributes of traveler n.Independence from Irrelevant Alternatives Property (IIA)Succinctly stated, the IIA property states that for a specific individual the ratio of the choice probabilities of any two alternatives is entirely unaffected by the systematic utilities of any other alternatives.& This property arises from the assumption in the derivation of the logit
terms en across individuals are independent.& In other words, it is assumed that un-observed attributes (error terms) of alternatives are independent.& In many cases this is an unrealistic assumption, and creates some difficulties.& For example, if driver n has an unobserved (error term) preference for public transit, then public transit
terms will not be independent.Another way to express IIA is that the ratio of choice probabilities of any two alternatives for a specific individual is entirely unaffected by the systematic utilities of any other alternatives.& &&Note that the ratio of probabilities of modes i and j for individual n are unaffected by ‘irrelevant’ alternatives in Cn.One way to pose the IIA problem is to explain the red bus/blue bus paradox. Assume that the initial choice probabilities for an individual are as follows:&&&&&&&&&&&&&&&&& P (auto) = P (A) = 70%&&&&&&&&&&&&&&&&& P (blue bus) = P (BB) = 20%&&&&&&&&&&&&&&&&& P (rail) = P(R) = 10%By the IIA assumption:P (A)/P (BB) = 70 / 20 = 3.5, and P(R)/P (BB) =& 10 / 20 = .5.&
Assume that a red bus is introduced with all the same attributes as those of the blue bus (i.e. it is indistinguishable from blue bus except for color, an unobserved attribute).& So, in order to retain constant ratio’s of alternatives (IIA), the original share of blue bus probability, the following is obtained: since the probability of the red bus and blue bus must be equal, and the total probability of all choices must sum to one. If one attempts an alternate solution where the original ‘bus’ share is split between RB and BB, and the correct ratios are retained, one obtains the same answer as previously.This is an unrealistic forecast by the logit model, since the individual is forecast to use buses more than before, and auto and rail less, despite the fact that a new
with new attributes has not been introduced.& In reality, one would not expect the probability of auto to decline, because for traveler n a ‘new’ alternative has not been introduced.& In estimating MNL models, the analyst must be cautious of cases similar to the red-bus/blue-bus problem, in which
share should decrease by a factor for each of the ‘similar’ alternatives. If one attempts an alternate solution where the original ‘bus’ share is split between RB and BB, and the correct ratios are retained, one obtains the same answer as previously. This is an unrealistic forecast by the logit model, since the individual is forecast to use buses more than before, and auto and rail less, despite the fact that a new
with new attributes has not been introduced.& In reality, one would not expect the probability of auto to decline, because for traveler n a ‘new’ alternative has not been introduced.& In estimating MNL models, the analyst must be cautious of cases similar to the red-bus/blue-bus problem, in which
share should decrease by a factor for each of the ‘similar’ alternatives.The IIA restriction does not apply to the
as a whole.& That is, it does not restrict the shares of the
choosing any two alternatives to be unaffected by the utilities of other alternatives.& The key in understanding this distinction is that for homogenous market segments IIA holds, but across market segments unobserved attributes vary, and thus the IIA property does not hold for a
of individuals.A MNL therefore is an appropriate
if the systematic component of utility accounts for heterogeneity across individuals. In general, models with many socio-economic variables have a better chance of not violating IIA.When IIA does not hold, there are various methods that can be used to ‘get around’ the problem, such as nested logit and probit models.Elasticities of MNLThe analyst can use coefficients estimated in logit models to determine both disaggregate and
elasticities, as well as cross-elasticities.& Disaggregate Direct ElasticitiesAn example of dissaggregate direct elasticity is given by the following. An analyst wants to know the effect of a unit change in the value of some
on alternative mode utilities, or
preferences for traveler n. The elasticity of traveler n for
k on alternative i is given by:&&&&&&& &&For example, assume that individual 18 (an observation in observed data) has an auto travel time of 51.0 minutes and transit travel time of 85.0 minutes.& For this individual, the probability of choosing auto is given by plugging auto travel time and transit travel time into the MNL estimated on the
of data:& & This individual’s direct elasticity of auto travel time with respect to auto choice probability is calculated to obtain: &Thus for an additional minute of travel time in the auto, there would be a decrease of 0.39% in auto usage for this individual. Of course this is the statistical result, which suggests that over repeated choice occasions, the decision maker would use auto 1 time less in 100 per 3 minutes of additional travel time. Disaggregate Cross-Elasticities&If the analyst was instead interested in the effect that travel time had on choosing
j, say auto, where Pn (Auto) = .45, the analyst could simply set up a cross elasticity as follows:&&which suggests that the deterministic utility of the Auto for traveler n increases by 22.5 given a unit increase of 1 minute of travel time to the bus stop.Aggregate Direct ElasticitiesThis
type of elasticity is simply the
of individual elasticities across some subgroup of individuals who chose
alternative i.& This is useful for
predicting the change in the expected
share across the group who chose alternative i.& The
elasticity is given by:&&&&&&The dis-aggregate elasticity is the probability that a subgroup will choose mode i with respect to a unit or incremental change in
k. For example, this could be used to predict the change in
share for group of transit if transit fare increased by 1 unit.The process of Estimating Multinomial Logit ModelsSpecification of the MNL requires several distinct steps to be taken by the analyst. 1.Identify the choice set, C of alternatives.& This will be different depending upon the geographical location, population, socio-economic characteristics, attributes of the alternatives, and factors that influence the choice context.2.Identify the feasible choice subsets Cn for individuals in the sample. Note that there one ‘universal’ choice set C for the entire population, and choice sets Cn for individuals in the population.& It is important that choice sets do not include modes that are not considered, and conversely, that all considered modes are represented. In practice it can be difficult to forecast with restricted choice sets, but the resulting
will be improved if restricted choice sets are known for individuals.3.Next, the analyst must identify which variables influence the decision process, which characteristics of individuals are important in the choice process, and how to measure and collect them.4.Design and administer a survey instrument (includes devising a sampling scheme) to collect the necessary information (the topic of another course), or observe/record the choice being made by individuals.5.Finally, MNL models are estimated and refined to select the ‘best’ using all of the
gathered in previous steps.Interpretation of MNL Model Results:Estimation of MNL models leads to fairly standard output from estimation programs. In general program output can be obtained showing coefficient estimates, model goodness of fit, elasticities, and various other aspects of
fitting. Model CoefficientsThere are several rules to consider when interpreting the coefficients in the MNL model.& 1.Alternative specific constants can only be included in MNL models for n-1 alternatives. Characteristics of decision-makers, such as socio-economic variables, must be entered as alternative specific. Characteristics of the alternative decisions themselves, such as costs of different choices, can be entered in MNL models as “generic” or as alternative specific. 2.Variable coefficients only have meaning in relation to each other, i.e., there is no ‘absolute’ interpretation of coefficients.& In other words, the absolute magnitude of coefficients is not interpretable like it is in ordinary least squares
models.3.Alternative Specific Constants, like regression, allow some flexibility in the estimation process and generally should be left in the model, even if they are not significant. 4.Like
coefficients in regression, the probability statements made in relation to t-statistics are conditional. For example, most computer programs provide t-statistics that provide the probability that the data were observed given that true coefficient value is zero. This is different than the probability that the coefficient is zero, given the data. Planning Example: A binary logit model was estimated on
from Washington, D.C. (see Ben-Akiva and Lerman, 1985). The following table (adapted from Ben-Akiva et. al.) shows the model results, specifically coefficient estimates, asymptotic standard errors, and the asymptotic t-statistic. These t-statistics represent the t-value (which corresponds to some probability) that the true
parameter is equal to zero.& Recall that the critical values of for a
are & 1.65 and & 1.96 for the 0.90 and 0.95 confidence levels respectively.&Variable Name&&&&&&&&&&&&&&&&&&& Coef. Estimate&&&&&&&&&&&&& Standard Error&&&&&&&&&&&& t statisticAuto Constant&&&&&&&&&&&&&&&&&&& 1.45&&&&&&&&&&&&&&&&&&&&&&&&&&&&& 0.393&&&&&&&&&&&&&&&&&&&&&&&&&&& 3.70&&&&& In-vehicle time (min)&&&&&&&&&& -0.00897&&&&&&&&&&&&&&&&&&&&&&& 0.0063&&&&&&&&&&&&&&&&&&&&&&&&& -1.42Out-of-vehicle time (min)&&& -0.0308&&&&&&&&&&&&&&&&&&&&&&&& 0.0106&&&&&&&&&&&&&&&&&&&&&&&&& -2.90Out of pocket cost*&&&&&&&&&&&& -0.0115&&&&&&&&&&&&&&&&&&&&&&&& 0.00262&&&&&&&&&&&&&&&&&&&&&&&& -4.39&&&& Transit fare **&&&&&&&&&&&&&&&&&&&&& -0.00708&&&&&&&&&&&&&&&&&&&&&&& 0.00378&&&&&&&&&&&&&&&&&&&&&&&& -1.87Auto ownership*&&&&&&&&&&&&&&& 0.770&&&&&&&&&&&&&&&&&&&&&&&&&&& 0.213&&&&&&&&&&&&&&&&&&&&&&&&&&& 3.16Downtown workplace* &&&&&& -0.561&&&&&&&&&&&&&&&&&&&&&&&&&& 0.306&&&&&&&&&&&&&&&&&&&&&&&&&&& -1.84(indicator variable)&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& &* au **transit specific variable&Inspection of the estimation results suggests that all else being equal, the auto is the preferred alternative, since the alternative specific constant for auto is positive. Note that only one alternative specific constant is entered in the model. Also, all but one of the variables is statistically significant at the 10% level of significance. &The
shows that for an additional minute of in-vehicle travel time, the utility of that mode decreases. Since the variable is entered as “generic”, it reflects the effect of in vehicle travel time in either transit or auto. It might be believed that travelers do not have the same response to travel time by mode, and so this variable could be entered as alternative specific. &The
shows that out-of-vehicle time, entered as “generic”, is by a factor of about 3 more influential on utility than is in-vehicle time.& &The
shows that travelers are sensi utility for transit decreases as transit fare increases, and utility for auto decreases as out-of-pocket costs increase. Notice that auto riders are approximately doubly sensitive to travel costs as are transit riders.&Owning a vehicle provides greater utility for taking auto, as one would expect. Working in the downtown actually reduces the utility of the auto—presumably the downtown is easily accessed via transit, and the impedance to downtown via auto is great. Although part of the impedance may partly be cost and travel time, which has already been captured, there may be additional impedance due to availability and cost of parking, safety, and other factors. &&&Refine modelsAssess Goodness of Fit There are several
measures available for testing ‘how’ well a MNL model fits the data on which it was estimated.& The likelihood ratio test is a generic test that can be used to compare models with different levels of complexity.& Let L (b&) be the maximum log likelihood attained with the estimated
vector b&, on which no constraints have been imposed.& Let (b&c) be the maximum log likelihood attained with constraints applied to a subset of coefficients in b&.& Then, asymptotically (i.e. for large samples) -2(L (b&c)-L (b&)) has a chi-square
equaling the number of constrained coefficients. Thus the above statistic, called the “likelihood ratio”, can be used to test the null hypothesis that two different models perform approximately the same (in explaining the data). If there is insufficient evidence to support the more complex model, then the simpler model is preferred. For large differences in log likelihood there is evidence to support preferring the more complex model to the simpler one.&& In the context of discrete choice analysis, two standard tests are often provided. The first test compares a model estimated with all variables suspected of influencing the choice process to a model that has no coefficients whatsoever—a
that predicts equal probability for all choices. The test
is given by:&&&&&&&&&& &&&&&&&&&&&&&&&&& -2(L (0) - L (b&)) & c2 , df = total number of coefficientsThe null hypothesis, H0 is that all coefficients are equal to 0, or, all alternatives are equally likely to be chosen. L (0) is the log likelihood computed when all coefficients including alternative specific constants are constrained to be zero, and L (b&) is the log likelihood computed with no constraints on the model. A second test compares a complex model with another na&ve model, however this model contains alternative specific constants for n-1 alternatives. This na&ve model is a
that predicts choice probabilities based on the observed market shares of the respective alternatives. The test statistics is given by:&&&&&&&&&&&&&&&&& -2(L(C) - L (b&)) & c2 , df = total number of coefficientsThe null hypothesis, H0 is that coefficients are zero except the alternative specific constants. L(C) is the log likelihood value computed when all slope coefficients are constrained to be equal to zero except alternative specific constants. In general the analyst can conduct specification tests to compare ‘full’ models versus reduced models using the chi-square test as follows:&&&&&&&&&&&&&&&&& -2(L (F) - L(R)) & c2 , df = total number of restrictions, = KF - KRIn logit models there is a
similar to R-Squared in regression called the Pseudo Coefficient of Determination. The Psuedo
is meant to convey similar information
as does R-Squared in regression—the larger is r2 the larger the proportion of log-likelihood ‘explained’ by the parameterized model. &The rc2 statistic is the r2
corrected for the number of parameters estimated, and is given by: Planning Example Continued: Consider again the binary logit model estimated on Washington, D.C., data. The summary
are provided below (adapted from Ben-Akiva and Lerman, 1985). &Summary
for Washington D.C. Binary Logit Modelnumber of
parameters = 7L (0) = - 1023L (b) = -374.4-2[L (0) – L (b)] = 1371.7c2 (7, 0.95) = 14.07r = 0.660rc = 0.654&The summary
show the log likelihood values for the na&ve model with zero parameters, and the value for the model with the seven parameters discussed previously. Clearly, the
with seven parameters has a larger log likelihood than the na&ve model, and in fact the likelihood ratio test suggests that this difference is statistically significant. The r and rc suggest that about 65% of the log likelihood is “explained” by the seven
model. This interpretation of r should be used loosely, as this interpretation is not strictly correct. A more useful application of rc would be to compare it to a competing
estimated on the same data. This would provide one piece of objective criterion for comparing alternative models. &Variables SelectionThere are asymptotic t-statistics that are evaluated similarly to t-statistics in
save for the restriction on sample sizes. That is to say that as the
grows the sampling distribution of estimated parameters approaches the t-distribution. Thus, variables entered into the logit model can be evaluated statistically using t-statistics and jointly using the log likelihood ratio test (see goodness of fit). Of course variables should be selected a priori based upon their theoretical or material role in the decision process being modeled.Tests of Non-Linear SpecificationsAs in classical regression, there may be times when a linear specification is an inappropriate one.& Thus, there are a couple of approaches to employ:& a piece-wise linear approach, and/or
transformations and/or
expansions.& Overall
tests can be conducted to determine if the additional variables or variable
offers an improvement.Test of Taste VariationsChoice theory is essentially a dis-aggregate modeling approach.& Specifically, it states that individuals have utility functions that include attributes of alternatives and of individuals.& However, the current
has fixed parameter estimates, that is, all individuals are expected to respond (linearly) to the same coefficients.& This might not be an altogether reasonable assumption, however, since individuals may have different
values, or even utility functions themselves.& There are two methods employed to get around this problem.& The first is to treat the socio-economic variables differently that enter into the models.& The second is to introduce what is called a random coefficients logit model, which is technically difficult and computationally burdensome.& In addition, for forecasting purposes it is not that useful to have random coefficients in a model. Market segmentation allows for different Beta values across market segments.& In this approach G market segments are defined, and a vector of parameters for each of the G market segments is estimated.& Then, the null
that: b1 = b2 =b3 =……= bG is tested using the fact that:&&&&&is chi-square distributed with degrees of freedom.&Are choice model assumptions met?Identically and independently distributed errorsIf the IIA assumption does not hold, there are alternative methods for estimating choice models. To determine if alternative models are necessary, there are some useful tests to determine if IIA is violated.Since the ratio of choice probabilities between two modes is expected to remain unchanged relative to ‘other’ choices, other choices could feasibly be added to the ‘choice set’ and the original choice probability ratios should remain unchanged.A test proposed by Hausman and McFadden (1984) incorporates the use of a test conducted between the restricted choice set model (r), which is the model estimated without one of the choice alternatives, and a full
(f), estimated on the full set of alternatives.& If IIA is not violated, then the coefficient estimates should only be affected by random fluctuation caused by statistical sampling.& The test
q=[bu - br]& [Vu - Vr] [bu - br] is asymptotically chi-square distributed with Kr degrees of freedom, where Kr is the number of coefficients in the restricted choice set model, bu and br are the coefficient vectors estimated for the unrestricted and restricted choice sets respectively, and Vu and Vr are the variance-covariance matrices for the unrestricted and restricted choice sets respectively. This test can be found in textbooks on discrete choice and some software programs.Uncorrelated errorsCorrelated errors occur when either unobserved attributes of choices are shared across alternatives (the IIA assumption), or when panel data are used and choices are correlated over time. Violation of the IIA assumption has been dealt with in a previous section. Panel
need to be handled with more sophisticated methods incorporating both cross-sectional and panel data. Outlier Analysis Similar to regression, the analyst should perform outlier analysis.& In doing so, the analyst should inspect the predicted choice probabilities with the chosen alternative.& An outlier can arbitrarily be defined as a case where a decision was chosen even though it only had a 1 in 100 chance of being selected.& When these cases are identified, the analyst looks for miscoding and measurement errors made on variables.If an observation is influential, but is not erroneous, then the analyst must search for ways to investigate.& One way is to estimate the
without the observation included, and then again without the observation.Alternative choice model specificationsWhen the IIA property of MNL is violated, the modeler should consider alternative specifications.& Recall that IIA is violated when alternatives share unobserved attributes. When there are shared unobserved components associated with different choices or alternatives, the utilities of the elements of the corresponding multidimensional choice set cannot be independent.& There are two common strategies of dealing with violations of the IIA assumption in the MNL model—nested logit and multi-nomial probit models.Nested logitOne may think of a multi-dimensional choice context as one with inherent structure, or hierarchy.& This notion helps the analyst to visualize the nested logit model, although the nested logit
is not inherently a hierarchical model. Consider the case of four travel alternatives, auto, bus with walk access, bus with auto access, and carpool.& This might be thought of as a nested choice structure, the first decision is made between public transit and auto, and then between which alternative given that public or private has been selected. Mathematically, this nested structure allows subsets of alternatives to share unobserved components of utility, which is a strict violation of the IIA property in the MNL model.& For example, if transit alternatives are nested together, then it is feasible that these alternatives shared unobserved utility components such as comfort, ride quality, safety, and other attributes of transit that were omitted from the systematic utility functions. This ‘work-around’ solution to the IIA assumption in MNL
is a feasible and relatively easy solution. In a nutshell, the analyst groups alternatives that share unobserved attributes at different levels of a nest, so as to allow
terms within a nest to be correlated. For more detailed discussions on nested logit models consult the references in listed in this chapter. Multi-nomial probitMultinomial probit is an extension of probit models to more than two alternatives. Unfortunately, they are difficult to estimate for more than 4 or 5 alternatives due to the mathematical complexity of the
as the number of alternatives increases. As computers become faster and/or computational methods become improved, multinomial probit models may be used to estimate models for reasonable sized choice sets.External validation of modelPrediction TestsLike in regression, perhaps the most powerful test of any
comes from the use of external ‘validation’ data.& External
can test that the model does not over-fit the estimation data, and can also be used to assess the generalizeability of
results across space and time. To externally validate a model, new data are collected and used to assess the predictive ability of the model.& It is important to understand that MNL models are difficult to interpret for individual predictions or decisions, so
should be consistent with the true interpretation of the model, that is, the Prn (i) is a long-term notion of probability, and therefore could best be validated preferably over panels of observations on individuals or through observations on ‘apparently’ homogenous groups of individuals. Conduct Statistical Inference, Document Model, and Implement The modeler is now ready to conduct statistical inference, document the model, and implement. Statistical inference is the process by which inferences are made about the population, or process being modeled, based on the
estimated on the
is the cornerstone of statistical theory and allows the modeler to make statements about the population.There are several estimated parameters that are used to make inferences about the population, the betas, which represent the
change in Y (choice probability) given a unit change in the X’s, and Yhat, which reflects the
response given a combination of X values. The interpretation of a
is very explicit and should be treated with caution. A 1- a
on b1 indicates that the true value of b1 will fall within the confidence limits given repeated samples taken on the same X levels a times out of 100.& Recall that b1 is the mean change in the
of the distribution in Y with a unit change in x.One-sided versus Two-sided Hypothesis TestingThe
given by b1 +- t {1 - a / 2;& n - 2} s [b1] is a two-sided confidence interval.& This means that the analyst is concerned with how probable events are on either side of the true
b1.& Often, the engineer wants to ask the question “does some
of b1 contain the value 0?”& The alternative hypotheses then are:&&&&&&&&&&& H0:& b1 = 0, and Ha: b1 ¹ 0.&If the
for a given confidence level 1 - a (i.e. 1 - .05 = .95) does not contain 0, then one can conclude that at that confidence level, a*100 times out of 100 when repeat samples are drawn at the same x levels the
will not contain 0.The analyst could alternatively test whether b1 is positive.& In this case the test hypotheses are:&&&&&&&&&&& H0: b1 & 0, and Ha: b1 & 0.&In this case one side of the
is considered such that all the
is assigned to one side of the probability distribution.Model DocumentationOnce a model has been estimated and selected to be the best among competing models it needs to be thoroughly documented, so that others may learn from the modeler’s efforts. It is important to recognize that a model that performs below expectations is still a
worth reporting. This is because the accumulation of knowledge is based on objective reporting of findings—and only presenting success stories is not objective reporting. It is just as valuable to learn that certain variables don’t appear effectual on a certain response than vice versa. When reporting the results of models, enough information should be provided so that another researcher could replicate your results. Not reporting things like
sizes, manipulations to the data, estimated variance, etc., could render follow-on studies difficult. Perhaps the most important aspect of model documentation is the theory behind the model. That is, all the variables in the model should be accompanied by a material explanation for being there. Why are the X’s important in their influence on Y? What is the mechanism by which X influences Y? Would one suspect an underlying causal relation, or is the relationship merely associative? These are the types of questions that should be answered in the documentation accompanying a modeling effort. In addition, the model equations, the t-statistics, R-square, MSE, and F-ratio tests results should be reported. Thorough
documentation will allow for future enhancements to an existing model.Model ImplementationModel implementation, of course, should have been considered early on in the planning stages of any
investigation. There are a number of considerations to take into account during implementation stages:1)Are the variables needed to run the
easily accessible?2)Is the
going to be used within the domain with which it was intended?3)Has the
been validated?4)Will the passage of time render
predictions invalid?5)Will transferring the model to another geographical location jeopardize
accuracy?These questions and other carefully targeted questions about the particular phenomenon under study will aid in an efficient and scientifically sound implementation plan. &How is a choice model equation interpreted? A MNL model equation represents the
between a dependent variable Y, which represents the probability of a particular choice being made, and one or more independent variables (X’s) that reflect attributes of the choices and the choice-maker. Unlike the linear regression model, the coefficients in choice models are multiplicative on the response. The model parameters, or partial slope coefficients, represent the change in Y given a unit change in X, all else held constant. If the model was estimated using experimental data, then the parameters may represent the change in Y caused by a unit change in a particular X. If the
was estimated using quasi-experimental or observational data, then the
parameters represent the change in Y associated with a unit change in a particular X, and do not necessarily represent causal effects.The choice model equation is meant to
as accurately as possible the relationships in the true population, in as simple an equation as is possible. The
model is known a priori not to capture all the structure in the real data, and is known to be wrong to some degree. The
represents a convenient way to explain relationships or predict future events given known inputs, or value of the independent variables (X’s).How do continuous and indicator variables differ in the choice model?Continuous variables are usually interval or
variables, whereas
are usually nominal or
variables. Indicator variables can be entered in the choice
or can be interacted with a
and effect the slope coefficient of the interacted variable. Indicator variables are somewhat analogous to testing the difference in means between two groups as in ANOVA. Indicator variables in the choice model can only take on one of two values— 0 or 1.How are beta coefficients interpreted?The betas in a logit model are called the
coefficients. The coefficient with the variable X1, b1, indicates the change in the
of the probability distribution of Y, the probability that a choice is made, per unit increase in the multiplicative exponent of X1. Thus, the interpretation is not straightforward as is the interpretation for linear regression.&& How is the Likelihood Ratio Test interpreted?The likelihood ratio test is similar to the F test in regression. Under the
that all coefficients are zero (or some other null
that represents a restricted model), that is b1 = b2 = …. = bK =& 0, the test
–2{L (0) – L (b)} is c2 distributed with K degrees of freedom. Often a more useful test compares a model with alternative specific constants in the model only, instead of a model with all coefficients equal to zero. Alternatively, the analyst can compare any full model with a restricted
using the likelihood ratio test. A large value of the likelihood ratio test statistic provides evidence against the restricted model. For a likelihood ratio observed at an alpha of 5%, the likelihood ratio test
as large as the one observed would be obtained in 5 samples in 100.How are t-statistics interpreted?A
is similar to a likelihood ratio test, except the test is for a single
in the model. The standard t-test provided by most standard statistical software packages is used to determine the probability that an individual variable’s parameter is equal to zero. In actuality the test is conditional on the variable’s
equaling zero, and provides the probability of the data having arisen under this constraint.In the theory of discrete choice models t-tests are not exact results, and are instead asymptotic results. This implies that as the
approaches infinity, the estimated model coefficients are distributed as t. How are phi and adjusted phi interpreted?Phi, or the likelihood ratio index, is analogous to R2 in linear regression. An adjusted phi, compensating for models with different numbers of explanatory variables, adjusts for the fact that phi can only increase or stays the same with additional explanatory variables. In general, the larger is phi, the greater is the explanatory
of the model.How are confidence intervals interpreted?A
is interpreted as follows: If samples were repeatedly drawn at the same X-levels as were drawn in the original sample, then alpha (a) times out of 100 the
Y’s will fall within the (1-a)% confidence interval. In simpler but less technically correct terms, the analyst is (1-a)% confident that the
falls in the confidence interval. Confidence intervals might be constructed around parameter values in discrete choice models using asymptotic t-distribution results. In this case, the analyst will make inferences about values of the true model parameters, which are estimated by the model coefficients.How are degrees of freedom interpreted?Degrees of freedom are associated with sample size. Every time a statistical
is estimated on a
of data the ability to computer additional parameters decreases.
are the number of independent data points used to estimate a particular parameter.How are elasticities computed and interpreted?An elasticity is the change in some response due to an independent variable. For instance an analyst may want to know the effect of a unit change in the value of some attribute, say travel time, on alternative mode utilities, or
preferences for traveler n.When is the independence of irrelevant alternatives (IIA) assumption violated?Succinctly stated, the IIA property states that for a specific individual the ratio of the choice probabilities of any two alternatives is entirely unaffected by the systematic utilities of any other alternatives.& This property arises from the assumption in the derivation of the logit
terms en across individuals are independent.& In other words, it is assumed that un-observed attributes (error terms) of alternatives are independent.& In many cases this is an unrealistic assumption, and creates some difficulties.& For example, if driver n has an unobserved (error term) preference for public transit, then public transit
terms will not be independent.Should interaction terms be included in the model?Interactions represent synergistic effects of two or more variables. Interaction terms represent potentially real relationships embedded in data. Most often they arise in quasi-experimental and observational data. An
that is important should be included in the model, despite the fact that it might not contribute much to
explanatory power. In general, third and higher order interactions (that are real in the population) can be ignored without much detriment to the model. How many variables should be included in the model?The objective of most modeling efforts is to economize the model, since it is known a priori that the great complexity underlying the data cannot be modeled exactly. In other words, the analyst generally wishes to explain as much of the
complexity with as few variables as practicable. Generally seven variables plus or minus two variables covers most models, although smaller and larger models can be found. It is generally better to favor a simpler model to a more complex one, simply because interpretation and implementation are simplified also. On the other hand, if the phenomenon is sufficiently complex, then making too simple a
may sacrifice too much explanatory or predictive power.What methods can be used to specify the relation between choice and the X’s?Unlike linear regression, which represents a linear relation between a
and one or more independent variables, it is difficult to develop a useful plot between explanatory variables and the response used in discrete choice models. An exception to this occurs when repeated observations are made on an individual, or data are grouped (aggregate). For instance, a plot of proportion choosing alternative A by group (or individual across repeat observations) may reveal some differences across experimental groups (or individuals).What methods are available for fixing heteroscedastic errors?Heteroscedasticity in discrete choice models is a violation of the IIA property. It occurs when there are shared unobserved components associated with different choice dimensions. The most common procedure for dealing with
is by employing the nested logit model. What methods are used for fixing serially correlated errors?Serially correlated errors occur when observations are taken over time. The primary reason is that there are unobserved attributes that affect the decision process for an individual over time, such as built in biases, experiences, etc. These kinds of data, often referred to as panel or time-series data, represent more complicated models of choice behavior. Consult Greene for additional details on dealing with serially correlated
in discrete choice models.& What can be done to deal with multi-collinearity?Multi-collinearity is when two variables co-vary, and is commonly found in non-experimental data. It is not a violation of choice models explicitly, however it does cause problems in the mathematics of solving for the regression parameters. In essence, highly collinear variables (e.g. a
coefficient of 0.7 or higher) cause regression parameters to be inefficient (high standard errors), and can cause the signs of the
coefficients to be counter-intuitive. There are few remedies to the
problem. First, highly correlated variables can be left in the model and assumed to reflect the natural state of those variables in reality. In this case the analyst must rely on the collinearity being ever-present in future observed data.& A second option is to remove the less important of the two collinear variables and keep only one in the model. This is usually the preferred option. A third option is to employ a biased estimation technique such as ridge
to opt for more biased but precise estimates, which are not influenced by the presence of multicollinearity.What is endogeneity and how can it be fixed?Endogeneity is fancy term for having an independent variable that is directly influenced by the dependent
Y. It is presumed a priori that all
are exogenous—that they are determined by influences outside of the modeling system. When a particular X is endogenous, the
errors are correlated with the variable, and problems in the
occur, such as biased estimates, etc. Some remedies include the use of instrumental variables approaches, proxy variables, and structural equations models. How does one know if the errors are Gumbel distributed?The Gumbel distribution is the assumed
of the MNL and nested logit models.& The Gumbel
has two parameters, a scale
h and a location parameter, m. It is conveniently assumed in MNL and nested logit models that the scale
is equal to 1, since it is not directly estimable. Unlike linear regression, it is not easy to determine if
errors are Gumbel distributed. In the case or grouped (aggregate) or repeated individual observation data, an analyst could plot the distribution of errors by computing the choice probabilities and comparing them to observed proportions of choices. However, this is often impractical, and due to the lack of alternative distributional forms offered in discrete choice models, is often not performed. &Poisson and negative binomial models are used most frequently in
for modeling count
(ordinal scale variable). Examples:1.Crash occurrence at a road section, intersection, often follows a Poisson or negative binomial distribution.2.Number of failures during a specified time period, can often be modeled as a Poisson process.1)Response or dependent
follows a Poisson or negative binomial process, characterized by: low probability of success in a given time period or duration and a large number of trials (time periods), fixed probability of “success” in any given time period.2)A negative binomial process results from a mixture of Poisson process, such as sampling from multiple Poisson process with varying means, such that the resulting
(negative binomial) is said to be an over dispersed Poisson distribution.3)Data can be censored or truncated, but this is not an essential characteristic of the process.&Ordinal
Y, which represents the number of events observed during intervals of time or space,&Sample of observations on Y for a vector of explanatory variables, X.&Goodness of fit
of observed count data, &Estimated effects of covariates, X, on count during intervals of time or space.There are several characteristics of the modeling of count data that warrant mention here. Note that modeling count
is not covered in detail in this manuscript, and so only some of the main highlights are provided. The references provided at the end of this chapter should be consulted for detailed guidance for estimating models based on count data. 1)A
Y that follows a Poisson process is characterized by a Poisson
l, where E [Y] = Var [Y] = l . When E [Y] & Var [Y], the Poisson process is said to be over-dispersed. 2)Over-dispersion of the Poisson process occurs in a number of ways. It can occur when a Poisson process is observed where time intervals are random instead of fixed. It can also occur wh

我要回帖

更多关于 logistic model tree 的文章

 

随机推荐