R用MCMCglmm但是数据缺失 多重填补法有缺失怎么办??

bayesian - MCMCglmm multinomial model in R - Stack Overflow
Join Stack Overflow to learn, share knowledge, and build your career.
or sign in with
I'm trying to create a model using the MCMCglmm package in R.
The data are structured as follows, where dyad, focal, other are all random effects, predict1-2 are predictor variables, and response 1-5 are outcome variables that capture # of observed behaviors of different subtypes:
dyad focal other r
resp1 resp2 resp3 resp4 resp5
So a model with only one outcome (teaching) is as follows:
prior_overdisp_i &- list(R=list(V=diag(2),nu=0.08,fix=2),
G=list(G1=list(V=1,nu=0.08), G2=list(V=1,nu=0.08), G3=list(V=1,nu=0.08), G4=list(V=1,nu=0.08)))
m1 &- MCMCglmm(teaching ~ trait-1 + at.level(trait,1):r + at.level(trait,1):present,
random= ~idh(at.level(trait,1)):focal + idh(at.level(trait,1)):other +
idh(at.level(trait,1)):X + idh(at.level(trait,1)):village,
rcov=~idh(trait):units, family = "zipoisson", prior=prior_overdisp_i,
data = data, nitt = nitt.1, thin = 50, burnin = 15000, pr = TRUE, pl = TRUE, verbose = TRUE, DIC
Hadfield's course notes (Ch 5) give an example of a multinomial model that uses only a single outcome variable with 3 levels (sheep horns of 3 types). Similar treatment can be found here:
This is not quite right for what I'm doing, but contains helpful background info.
Another reference (Hadfield 2010) gives an example of a
multi-response MCMCglmm that follows the same format but uses cbind() to predict a vector of responses, rather than a single outcome. The same model with multiple responses would look like this:
m1 &- MCMCglmm(cbind(resp1, resp2, resp3, resp4, resp5) ~ trait-1 +
at.level(trait,1):r + at.level(trait,1):present,
random= ~idh(at.level(trait,1)):focal + idh(at.level(trait,1)):other +
idh(at.level(trait,1)):X + idh(at.level(trait,1)):village,
rcov=~idh(trait):units,
family = cbind("zipoisson","zipoisson","zipoisson","zipoisson","zipoisson"),
prior=prior_overdisp_i,
data = data, nitt = nitt.1, thin = 50, burnin = 15000, pr = TRUE, pl = TRUE, verbose = TRUE, DIC
I have two programming questions here:
How do I specify a prior for this model? I've looked at the materials mentioned in this post but just can't figure it out.
I've run a similar version with only two response variables, but I only get one slope - where I thought I should get a different slope for each resp variable. Where am I going wrong, or having I misunderstood the model?
Answer to my first question, based on the HLP post and some help from a colleage/stats consultant:
# values for prior
k &- 5 # originally: length(levels(dative$SemanticClass)), so k = # of outcomes for SemanticClass
aka categorical outcomes
I &- diag(k-1) #should make matrix of 0's with diagonal of 1's, dimensions k-1 rows and k-1 columns
J &- matrix(rep(1, (k-1)^2), c(k-1, k-1)) # should make k-1 x k-1 matrix of 1's
And for my model, using the multinomial5 family and 5 outcome variables, the prior is:
prior = list(
R = list(fix=1, V=0.5 * (I + J), n = 4),
G1 = list(V = diag(4), n = 4))
For my second question, I need to add an interaction term to the fixed effects in this model:
m &- MCMCglmm(cbind(Resp1, Resp2...) ~ -1 + trait*predictorvariable,
The result gives both main effects for the Response variables and posterior estimates for the Response/Predictor interaction (the effect of the predictor variable on each response variable).
Your Answer
Sign up or
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Post as a guest
By posting your answer, you agree to the
Not the answer you're looking for?
Browse other questions tagged
Stack Overflow works best with JavaScript enabled混合线性模型软件包的介绍;混合线性模型是一般线性模型的延伸和拓展,在作物、;R包:;nlme:是一个在S-Plus应用广泛的混线性模;lme4:是nlme的进一步发展,比nlme运行;MCMCglmm:用马尔可夫链蒙特卡罗(Mark;asreml:是ASReml软件的R版本,运算速;glmmADMB:是ADMB软件的R版本,很灵活;非R包:;ASReml商业
混合线性模型软件包的介绍
混合线性模型是一般线性模型的延伸和拓展,在作物、林木、动物、水产育种和科研中应用广泛。相对于一般线性模型,它能处理缺失值和不平衡数据,可以支持数据的方差不齐次和不独立,使得数据分析更准确和高效。分析混合线性模型的软件很多,这里将其分为R包和非R包。 R包: nlme:是一个在S-Plus应用广泛的混线性模型包,后来转换到R平台上,对于镶嵌结构(nested)的随机因子定义简单,但对于交叉的随机因子(crossed)定义困难。具有多个功能,比如lme应用于线性混合模型,nlme应用于非线性混合模型。可以定义复杂的方差结构,不支持广义线性混合模型(GLMM)。 lme4:是nlme的进一步发展,比nlme运行速度要快,支持GLMM,但很难处理交叉的随机因子。 MCMCglmm:用马尔可夫链蒙特卡罗(Markov Chain Monte Carlo,MCMC)的方法拟合模型,贝叶斯先验分布,可以定义一些复杂的方差结构(heterogeneous yes, AR1 no)。 asreml:是ASReml软件的R版本,运算速度快,支持复杂的模型(G矩阵和R矩阵),支持系谱信息和多性状分析,在动物、作物、林木、水产育种和科研中应用广范。 glmmADMB:是ADMB软件的R版本,很灵活,但是运行速度很慢。 非R包:
ASReml 商业软件:有单机版(Win、Linux、Mac)和R版(ASReml-R),也有窗口化的版本(GenStat),应用稀疏矩阵和Ai算法,速度很快,广泛应用于植物和动物育种,支持随机因子的矩阵定义(G)和残差的矩阵定义(R),Splines也被很好的整合,对于广义线性模型,应用的是PQL方法。
ADMB:自动模型微分(Automatic Differentiation Model Builder),主要应用在森林、水产和野生动物中,开始时是个商业软件,现在开源了,支持非线性混合模型。
SAS 商业软件: ? PROC MIXED:一般线性混合模型(LMM),应用广泛,但是速度比较慢。 ? PROC GLIMMIX:增加了广义线性模型(GLMM),它现在支持了Laplace approximation和adaptive Gaussian quadrature方法,但对于复杂的模型,用的还是PQL方法。 ? HPMIXED:是MIXED的改进版,在速度上有明显的提升,但是支持的模型较少。 ? PROC NLMIXED:支持非线性混合模型 功能介绍:
软件 nlme 函数 lme 方法 ML REML ML REML 置信区间 标准差(SD) 标准误(se) 随机(G矩阵) 残差(R矩阵) Nested block Nested Diagonal block Nested Cross Block Splines Nested Cross Block Nested Cross Block Nested Cross Block 相关联(corStruct) 异质性(varStruct) NO AR Cor Str Unstructed Matern Factor Analysis NO 其它 速度较慢 没有给出方差组分 不支持系谱 不支持多性状分析 lme4 lmer ASreml asreml AI REML 标准误(se) 速度快 支持系谱 支持多性状分析 SAS Mixed HPMIXED GLIMMIX MCMCglmm ML REML 标准误(se) Credible intervals 速度慢 支持系谱 支持多性状分析 速度慢 支持系谱 支持多性状分析
MCMCglmm MCMC
函数 Xtmixed Xtreg LM REML
Unstructed Nlme Estimate T Df P ? ? ? ? ? ? ? ? ? ? Lme4 Estimate Std err T Df P ? ? ? ? ? ? ? ? ? ? MCMCglmm ASReml Loglike Varcomp Std.err Z-ratio Constraint ? ? ? ? ? ? ? ? ? ? Summary Post.mean CI Eff.sample ? ? ? ? ? ? ? ? ? ? Coef效应值 Fixed(BLUE) Ranef(BLUP) Loglik Plot Predict Ped Intervals Update LRT
三亿文库包含各类专业文献、外语学习资料、中学教育、文学作品欣赏、高等教育、幼儿教育、小学教育、生活休闲娱乐、专业论文、31混合线性模型软件包的介绍等内容。 
 在层次结构数据中, 不仅有描述个体的变量, 而且有...较低层次的主效果, Hall(1994)称之为混合因子模型...上述多层线性模型通过统计软件 HLM 运算所得结果如 ...  三、类型 1、群组模型: 即以上所介绍的研究背景效应的数据处理方式。 2、发展...五、HLM 软件分析步骤: 具体操作步骤可以参看:张雷等.多层线性模型应用.北京:...  混合线性模型软件包的介绍_数学_自然科学_专业资料。混合线性模型是一般线性模型的延伸和拓展,在作物、林木、动物、水产育种和科研中应用广泛。相对于一般线性模型,它...  SPSS数据分析―混合线性模型_计算机软件及应用_IT/计算机_专业资料。SPSS数据分析―混合线性模型 之前介绍过的基于线性模型的方差分析,虽然扩展了方差分析的领域,但是...  -2- 第二章 固定效应的估计 第二章 固定效应的估计 2.1 混合线性模型的...下面以两向分类混合模型为例来具体介绍参数估计的过程 对两向分类混合模型 yij ...  特别地,人们可以使用 SAS,R 等统 计软件直接分析数据。然而,随着对线性混合模型研究的深入,人们发 现实际数据中正态性假设并不完全成立 ,特别是随机效应的正态...  GAMS软件各模块介绍_计算机软件及应用_IT/计算机_...DICOPT 是解决 MINLP(混合整数非线性规划)模型的...包含了本地 NLP 求解器 LSGRG,是全局分析包的一...  那么对于这样一类的多层线性的数据,我们该如何进行数据处理呢,小编将持续 为大家...HLM6.0 软件 方法/步骤 1. 下图呈现的是对模型随机部分的信息描述, 当前模型...I was working with a small experiment which includes families from two Eucalyptus species and thought it would be nice to code a first analysis using alternative approaches. The experiment is a randomized complete block design, with species as fixed effect and family and block as a random effects, while the response variable is growth strain (in \( mu epsilon\)).
When looking at the trees one can see that the residual variances will be very different. In addition, the trees were growing in plastic bags laid out in rows (the blocks) and columns. Given that trees were growing in bags siting on flat terrain, most likely the row effects are zero.
Below is the code for a first go in R (using both MCMCglmm and ASReml-R) and SAS. I had stopped using SAS for several years, mostly because I was running a mac for which there is no version. However, a few weeks ago I started accessing it via their
program via a web browser.
The R code using REML looks like:
options(stringsAsFactors = FALSE)
setwd('~/Dropbox/research/2013/growth-stress')
# Packages
require(ggplot2)
require(asreml)
require(MCMCglmm)
# Reading data, renaming, factors, etc
gs = read.csv('eucalyptus-growth-stress.csv')
summary(gs)
# Both argophloia and bosistoana
gsboth = subset(gs, !is.na(strain))
gsboth = within(gsboth, {
species = factor(species)
row = factor(row)
column = factor(column)
fam = factor(fam)
ma = asreml(strain ~ species, random = ~ fam + row,
rcov = ~ at(species):units,
data = gsboth)
summary(ma)$varcomp
component std.error
z.ratio constraint
#fam!fam.var
502.036 2.6480022
#row!row.var
16.357 0.7499666
#species_E.argopholia!variance
#species_E.bosistoana!variance
123456789101112131415161718192021222324252627282930313233343536
# Optionsoptions(stringsAsFactors = FALSE)setwd('~/Dropbox/research/2013/growth-stress')&# Packagesrequire(ggplot2)require(asreml)require(MCMCglmm)&&# Reading data, renaming, factors, etcgs = read.csv('eucalyptus-growth-stress.csv')summary(gs)&# Both argophloia and bosistoanagsboth = subset(gs, !is.na(strain))gsboth = within(gsboth, {species = factor(species)row = factor(row)column = factor(column)fam = factor(fam)})&&&ma = asreml(strain ~ species, random = ~ fam + row,rcov = ~ at(species):units,data = gsboth)&summary(ma)$varcomp&#&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& gamma&&component std.error&& z.ratio constraint#fam!fam.var&&&&&&&&&&&&&&&&&&&&&&502.036 2.6480022&& Positive#row!row.var&&&&&&&&&&&&&&&&&&&& && 16.357 0.7499666&& Positive#species_E.argopholia!variance
.2067580&& Positive#species_E.bosistoana!variance&&&&&&.7224681&& Positive
While using MCMC we get estimates in the ballpark by using:
bothpr = list(R = list(V = diag(c(5)), nu = 3),
G = list(G1 = list(V = 20000, nu = 0.002),
G2 = list(V = 20000, nu = 0.002),
G3 = list(V = 20000, nu = 0.002)))
m2 = MCMCglmm(strain ~ species, random = ~ fam + row + column,
rcov = ~ idh(species):units,
data = gsboth,
prior = bothpr,
pr = TRUE,
family = 'gaussian',
burnin = 10000,
nitt = 40000,
thin = 20,
saveX = TRUE,
saveZ = TRUE,
verbose = FALSE)
summary(m2)
# Iterations =
# Thinning interval
# Sample size
# G-structure:
# post.mean l-95% CI u-95% CI eff.samp
# post.mean l-95% CI u-95% CI eff.samp
# R-structure:
~idh(species):units
# post.mean l-95% CI u-95% CI eff.samp
# E.argopholia.units
# E.bosistoana.units
# Location effects: strain ~ species
# post.mean l-95% CI u-95% CI eff.samp
# (Intercept)
# speciesE.bosistoana
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
# Priorsbothpr = list(R = list(V = diag(c(50000, 50000)), nu = 3),G = list(G1 = list(V = 20000, nu = 0.002),G2 = list(V = 20000, nu = 0.002),G3 = list(V = 20000, nu = 0.002)))&# MCMCm2 = MCMCglmm(strain ~ species, random = ~ fam + row + column,rcov = ~ idh(species):units,data = gsboth,prior = bothpr,pr = TRUE,family = 'gaussian',burnin = 10000,nitt = 40000,thin = 20,saveX = TRUE,saveZ = TRUE,verbose = FALSE)&summary(m2)&# Iterations = # Thinning interval&&= 20# Sample size&&= 1500## DIC: ## G-structure:&&~fam## post.mean l-95% CI u-95% CI eff.samp# fam&&&& 30315&&&&12211&&&&55136&&&& 1500## ~row## post.mean l-95% CI u-95% CI eff.samp# row&&&&&&1449&&&&5.928&&&& 6274&&&&589.5## R-structure:&&~idh(species):units## post.mean l-95% CI u-95% CI eff.samp# E.argopholia.units&&&&112017&&&&71152&& 168080&&&& 1500# E.bosistoana.units&&&& 65006&&&&52676&&&&80049&&&& 1500## Location effects: strain ~ species## post.mean l-95% CI u-95% CI eff.samp&&pMCMC# (Intercept)&&&&&&&&&&&&502.21&& 319.45&& 690.68&&&&
***# speciesE.bosistoana&& -235.95&&-449.07&& -37.19&&&&
The SAS code is not that disimilar, except for the clear demarcation between data processing (data step, for reading files, data transformations, etc) and specific procs (procedures), in this case to summarize data, produce a boxplot and fit a mixed model.
* termstr=CRLF accounts for the windows-like line end
infile "/home/luis/Projects/euc-strain/growthstresses.csv"
dsd termstr=CRLF firstobs=2;
input row column species $ family $
if strain ^= .;
proc summary data =
proc boxplot data =
plot strain*
proc mixed data =
model strain =
repeated species / group=
Covariance Parameter Estimates
species E.argoph
species E.bosist
1234567891011121314151617181920212223242526272829303132
* termstr=CRLF accounts for the windows-like line endings of the data set;data gs;infile "/home/luis/Projects/euc-strain/growthstresses.csv"dsd termstr=CRLF firstobs=2;input row column species $ family $ strain;if strain ^= .;run;&proc summary data = gs print;class species;var strain;run;&proc boxplot data = gs;plot strain*species;run;&proc mixed data = gs;class row species family;model strain = species;random row family;repeated species / group=species;run;&/*Covariance Parameter EstimatesCov Parm
&&&&&&&&&&&&&& Estimaterow
&& &&&&&&&&&&&&&&&&&&&&&& 2336.80family
&& &&&&&&&&&&&&&&&&&&&&&& 27808species
species E.argoph
111844species
species E.bosist
SAS boxplot for the data set.
I like working with multiple languages and I realized that, in fact, I missed SAS a bit. It was like m at the beginning felt strange but we were quickly chatting away after a few minutes.
A few years ago we had this really cool idea: we had to establish a trial to understand wood quality in context. Sort of following the saying “we don’t know who discovered water, but we are sure that it wasn’t a fish” (attributed to Marshall McLuhan). By now you are thinking WTF is this guy talking about? But let’s put a trial that had the species we wanted to study (Pinus radiata, a gymnosperm) and an angiosperm (Eucalyptus nitens if you wish to know) to provide the contrast, as they are supposed to have vastly different types of wood. From space the trial looked like this:
The reason you can clearly see the pines but not the eucalypts is because the latter were dying like crazy over a summer drought (45% mortality in one month). And here we get to the analytical part: we will have a look only at the eucalypts where the response variable can’t get any clearer, trees were either totally dead or alive. The experiment followed a randomized complete block design, with 50 open-pollinated families in 48 blocks. The original idea was to harvest 12 blocks each year but—for obvious reasons—we canned this part of the experiment after the first year.
The following code shows the analysis in asreml-R, lme4 and MCMCglmm:
load('~/Dropbox/euc.Rdata')
library(asreml)
sasreml = asreml(surv ~ 1, random = ~ Fami + Block,
data = euc,
family = asreml.binomial(link = 'logit'))
summary(sasreml)$varcomp
gamma component
#Fami!Fami.var
0....975591
#Block!Block.var
0....653324
#R!variance
1..0000000
constraint
#Fami!Fami.var
#Block!Block.var
#R!variance
# Quick look at heritability
varFami = summary(sasreml)$varcomp[1, 2]
varRep = summary(sasreml)$varcomp[2, 2]
h2 = 4*varFami/(varFami + varRep + 3.29)
#[1] 0.5718137
library(lme4)
slme4 = lmer(surv ~ 1 + (1|Fami) + (1|Block),
data = euc,
family = binomial(link = 'logit'))
summary(slme4)
#Generalized linear mixed model fit by the Laplace approximation
#Formula: surv ~ 1 + (1 | Fami) + (1 | Block)
BIC logLik deviance
#Random effects:
Variance Std.Dev.
(Intercept) 0.665
(Intercept) 0.143
#Number of obs: 2090, groups: Fami, 51; Block, 48
#Fixed effects:
Estimate Std. Error z value Pr(&|z|)
#(Intercept)
# Quick look at heritability
varFami = VarCorr(slme4)$Fami[1]
varRep = VarCorr(slme4)$Block[1]
h2 = 4*varFami/(varFami + varRep + 3.29)
#[1] 0.6037697
# And let's play to be Bayesians!
library(MCMCglmm)
pr = list(R = list(V = 1, n = 0, fix = 1),
G = list(G1 = list(V = 1, n = 0.002),
G2 = list(V = 1, n = 0.002)))
sb &- MCMCglmm(surv ~ 1,
random = ~ Fami + Block,
family = 'categorical',
data = euc,
prior = pr,
verbose = FALSE,
pr = TRUE,
burnin = 10000,
nitt = 100000,
thin = 10)
plot(sb$VCV)
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
load('~/Dropbox/euc.Rdata')&library(asreml)sasreml = asreml(surv ~ 1, random = ~ Fami + Block,data = euc,family = asreml.binomial(link = 'logit'))summary(sasreml)$varcomp&#&&&&&&&&&&&&&&&&&&&&&&gamma component&&std.error&&z.ratio#Fami!Fami.var&&&& 0....975591#Block!Block.var&& 0....653324#R!variance&&&&&&&&1..0000000&&&&&&&& NA&&&&&& NA&#&&&&&&&&&&&&&&&& constraint#Fami!Fami.var&&&&&&Positive#Block!Block.var&&&&Positive#R!variance&&&&&&&&&&&&Fixed&# Quick look at heritabilityvarFami = summary(sasreml)$varcomp[1, 2]varRep = summary(sasreml)$varcomp[2, 2]h2 = 4*varFami/(varFami + varRep + 3.29)h2#[1] 0.5718137&&library(lme4)slme4 = lmer(surv ~ 1 + (1|Fami) + (1|Block),data = euc,family = binomial(link = 'logit'))&summary(slme4)&#Generalized linear mixed model fit by the Laplace approximation#Formula: surv ~ 1 + (1 | Fami) + (1 | Block)#&& Data: euc#&&AIC&&BIC logLik deviance# &&-1360&&&& 2719#Random effects:# Groups&& Name&&&&&&&&Variance Std.Dev.# Fami&&&& (Intercept) 0.665# Block&&&&(Intercept) 0.143#Number of obs: 2090, groups: Fami, 51; Block, 48##Fixed effects:#&&&&&&&&&&&&Estimate Std. Error z value Pr(&|z|)#(Intercept)&& 0.2970&&&& 0.1315&& 2.259&& 0.0239 *&# Quick look at heritabilityvarFami = VarCorr(slme4)$Fami[1]varRep = VarCorr(slme4)$Block[1]h2 = 4*varFami/(varFami + varRep + 3.29)h2#[1] 0.6037697&# And let's play to be Bayesians!library(MCMCglmm)pr = list(R = list(V = 1, n = 0, fix = 1),G = list(G1 = list(V = 1, n = 0.002),G2 = list(V = 1, n = 0.002)))&sb &- MCMCglmm(surv ~ 1,random = ~ Fami + Block,family = 'categorical',data = euc,prior = pr,verbose = FALSE,pr = TRUE,burnin = 10000,nitt = 100000,thin = 10)&plot(sb$VCV)
You may be wondering Where does the 3.29 in the heritability formula comes from? Well, that’s the variance of the link function that, in the case of the logit link is pi*pi/3. In the case of MCMCglmm we can estimate the degree of genetic control quite easily, remembering that we have half-siblings (open-pollinated plants):
# Heritability
h2 = 4*sb$VCV[, 'Fami']/(sb$VCV[, 'Fami'] +
sb$VCV[, 'Block'] + 3.29 + 1)
posterior.mode(h2)
#0.6476185
HPDinterval(h2)
#var1 0..9698148
#attr(,"Probability")
1234567891011121314
# Heritabilityh2 = 4*sb$VCV[, 'Fami']/(sb$VCV[, 'Fami'] +sb$VCV[, 'Block'] + 3.29 + 1)posterior.mode(h2)#&&&& var1#0.6476185&HPDinterval(h2)#&&&&&&&& lower&&&& upper#var1 0..9698148#attr(,"Probability")#[1] 0.95&plot(h2)
By the way, it is good to remember that we need to back-transform the estimated effects to probabilities, with very simple code:
# Getting mode and credible interval for solutions
inv.logit(posterior.mode(sb$Sol))
inv.logit(HPDinterval(sb$Sol, 0.95))
# Getting mode and credible interval for solutionsinv.logit(posterior.mode(sb$Sol))inv.logit(HPDinterval(sb$Sol, 0.95))
Even if one of your trials is trashed there is a silver lining: it is possible to have a look at survival.
This week I’m facing my—and many other lecturers’—least favorite part of teaching: grading exams. In a supreme act of procrastination I will continue , and , showing the code for a bivariate analysis of a randomized complete block design.
Just to recap, the results from the REML multivariate analysis (that used ASReml-R) was the following:
library(asreml)
m4 = asreml(cbind(bden, veloc) ~ trait,
random = ~ us(trait):Block +
us(trait):Family, data = a,
rcov = ~ units:us(trait))
summary(m4)$varcomp
#trait:Block!trait.bden:bden
#trait:Block!trait.veloc:bden
#trait:Block!trait.veloc:veloc
#trait:Family!trait.bden:bden
#trait:Family!trait.veloc:bden
#trait:Family!trait.veloc:veloc 2. 2. 8.
#R!variance
#R!trait.bden:bden
#R!trait.veloc:bden
#R!trait.veloc:veloc
z.ratio constraint
#trait:Block!trait.bden:bden
#trait:Block!trait.veloc:bden
#trait:Block!trait.veloc:veloc
#trait:Family!trait.bden:bden
#trait:Family!trait.veloc:bden
#trait:Family!trait.veloc:veloc
#R!variance
#R!trait.bden:bden
14.7057812
#R!trait.veloc:bden
#R!trait.veloc:veloc
17.4171524
123456789101112131415161718192021222324252627282930
library(asreml)&m4 = asreml(cbind(bden, veloc) ~ trait,random = ~ us(trait):Block +&&us(trait):Family, data = a,rcov = ~ units:us(trait))&summary(m4)$varcomp&#&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&gamma&&&&component&&&&std.error#trait:Block!trait.bden:bden&&&&1. 1. 7.#trait:Block!trait.veloc:bden&& 1. 1. 2.#trait:Block!trait.veloc:veloc&&2. 2. 1.#trait:Family!trait.bden:bden&& 8. 8. 2.#trait:Family!trait.veloc:bden&&1. 1. 1.#trait:Family!trait.veloc:veloc 2. 2. 8.#R!variance&&&&&&&&&&&&&&&&&&&& 1. 1.&&&&&&&&&& NA#R!trait.bden:bden&&&&&&&&&&&&&&5. 5. 3.#R!trait.veloc:bden&&&&&&&&&&&& 6. 6. 1.#R!trait.veloc:veloc&&&&&&&&&&&&1. 1. 9.#&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&z.ratio constraint#trait:Block!trait.bden:bden&&&& 2.0738303&& Positive#trait:Block!trait.veloc:bden&&&&0.8624639&& Positive#trait:Block!trait.veloc:veloc&& 1.8135789&& Positive#trait:Family!trait.bden:bden&&&&2.8128203&& Positive#trait:Family!trait.veloc:bden&& 1.3996166&& Positive#trait:Family!trait.veloc:veloc&&2.7650886&& Positive#R!variance&&&&&&&&&&&&&&&&&&&&&&&&&&&& NA&&&&&&Fixed#R!trait.bden:bden&&&&&&&&&&&&&&14.7057812&& Positive#R!trait.veloc:bden&&&&&&&&&&&&&&4.3442117&& Positive#R!trait.veloc:veloc&&&&&&&&&&&&17.4171524&& Positive
The corresponding MCMCglmm code is not that different from ASReml-R, after which it is modeled anyway. Following the recommendations of the MCMCglmm Course Notes (included with the package), the priors have been expanded to diagonal matrices with degree of belief equal to the number of traits. The general intercept is dropped (-1) so the trait keyword represents trait means. We are fitting unstructured (us(trait)) covariance matrices for both Block and Family, as well as an unstructured covariance matrix for the residuals. Finally, both traits follow a gaussian distribution:
library(MCMCglmm)
bp = list(R = list(V = diag(c(0.007, 260)), n = 2),
G = list(G1 = list(V = diag(c(0.007, 260)), n = 2),
G2 = list(V = diag(c(0.007, 260)), n = 2)))
bmod = MCMCglmm(cbind(veloc, bden) ~ trait - 1,
random = ~ us(trait):Block + us(trait):Family,
rcov = ~ us(trait):units,
family = c('gaussian', 'gaussian'),
prior = bp,
verbose = FALSE,
pr = TRUE,
burnin = 10000,
nitt = 20000,
thin = 10)
12345678910111213141516
library(MCMCglmm)bp = list(R = list(V = diag(c(0.007, 260)), n = 2),G = list(G1 = list(V = diag(c(0.007, 260)), n = 2),G2 = list(V = diag(c(0.007, 260)), n = 2)))&bmod = MCMCglmm(cbind(veloc, bden) ~ trait - 1,random = ~ us(trait):Block + us(trait):Family,rcov = ~ us(trait):units,family = c('gaussian', 'gaussian'),data = a,prior = bp,verbose = FALSE,pr = TRUE,burnin = 10000,nitt = 20000,thin = 10)
Further manipulation of the posterior distributions requires having an idea of the names used to store the results. Following that, we can build an estimate of the genetic correlation between the traits (Family covariance between traits divided by the square root of the product of the Family variances). Incidentally, it wouldn’t be a bad idea to run a much longer chain for this model, so the plot of the posterior for the correlation looks better, but I’m short of time:
dimnames(bmod$VCV)
rg = bmod$VCV[,'veloc:bden.Family']/sqrt(bmod$VCV[,'veloc:veloc.Family'] *
bmod$VCV[,'bden:bden.Family'])
posterior.mode(rg)
#0.2221953
HPDinterval(rg, prob = 0.95)
#var1 -0..5764006
#attr(,&Probability&)
1234567891011121314151617
dimnames(bmod$VCV)&rg = bmod$VCV[,'veloc:bden.Family']/sqrt(bmod$VCV[,'veloc:veloc.Family'] *bmod$VCV[,'bden:bden.Family'])&plot(rg)&posterior.mode(rg)#&&&& var1#0.2221953&&HPDinterval(rg, prob = 0.95)#&&&&&&&& lower&&&& upper#var1 -0..5764006#attr(,&Probability&)#[1] 0.95
And that’s it! Time to congratulate
for developing this package.
Until today all the posts in this blog have used a frequentist view of the world. I have a confession to make: I have an
view of statistics and I do sometimes use Bayesian approaches in data analyses. This is not quite one of those “the truth will set you free” moments, but I’ll show that one could almost painlessly
I presented before using .
is a very neat package that—as its rather complicated em cee em cee gee el em em acronym implies—implements MCMC for generalized linear mixed models. We’ll skip that the frequentist fixed vs random effects distinction gets blurry in Bayesian models and still use the f… and r… terms. I’ll first repeat the code for a Randomized Complete Block design with Family effects (so we have two random factors) using both lme4 and ASReml-R and add the MCMCglmm counterpart:
library(lme4)
m1 = lmer(bden ~ 1 + (1|Block) + (1|Family),
data = trees)
summary(m1)
#Linear mixed model fit by REML
#Formula: bden ~ (1 | Block) + (1 | Family)
BIC logLik deviance REMLdev
#Random effects:
Variance Std.Dev.
(Intercept)
(Intercept) 162.743
# Residual
#Number of obs: 492, groups: Family, 50; Block, 11
#Fixed effects:
Estimate Std. Error t value
#(Intercept)
library(asreml)
m2 = asreml(bden ~ 1, random = ~ Block + Family,
data = trees)
summary(m2)$varcomp
gamma component std.error
#Block!Block.var
#Family!Family.var 0..853
#R!variance
1..923 14.683496
constraint
#Block!Block.var
#Family!Family.var
#R!variance
m2$coeff$fixed
#(Intercept)
123456789101112131415161718192021222324252627282930313233343536373839404142
library(lme4)m1 = lmer(bden ~ 1 + (1|Block) + (1|Family),data = trees)&summary(m1)&#Linear mixed model fit by REML#Formula: bden ~ (1 | Block) + (1 | Family)#&& Data: a#&&AIC&&BIC logLik deviance REMLdev# &&-2282&&&& 4569&&&&4564#Random effects:# Groups&& Name&&&&&&&&Variance Std.Dev.# Family&& (Intercept)&&82.803&& 9.0996# Block&&&&(Intercept) 162.743&&12.7571# Residual&&&&&&&&&&&& 545.980&&23.3662#Number of obs: 492, groups: Family, 50; Block, 11&#Fixed effects:#&&&&&&&&&&&&Estimate Std. Error t value#(Intercept)&&306.306&&&&&&4.197&& 72.97&&library(asreml)m2 = asreml(bden ~ 1, random = ~ Block + Family,data = trees)&summary(m2)$varcomp#&&&&&&&&&&&&&&&&&&&&&&gamma component std.error&& z.ratio#Block!Block.var&& 0..771&&2.073362#Family!Family.var 0..853&&2.809587#R!variance&&&&&&&&1..923 14.683496&#&&&&&&&&&&&&&&&& constraint#Block!Block.var&&&&Positive#Family!Family.var&&Positive#R!variance&&&&&&&& Positive&&m2$coeff$fixed#(Intercept)#&&&&306.306
We had already established that the results obtained from lme4 and ASReml-R were pretty much the same, at least for relatively simple models where we can use both packages (as their functionality diverges later for more complex models). This example is no exception and we quickly move to fitting the same model using MCMCglmm:
library(MCMCglmm)
priors = list(R = list(V = 260, n = 0.002),
G = list(G1 = list(V = 260, n = 0.002),
G2 = list(V = 260, n = 0.002)))
m4 = MCMCglmm(bden ~ 1,
random = ~ Block + Family,
family = 'gaussian',
prior = priors,
verbose = FALSE,
pr = TRUE,
burnin = 10000,
nitt = 20000,
thin = 10)
plot(mcmc.list(m4$VCV))
autocorr(m4$VCV)
posterior.mode(m4$VCV)
#126.671 542.42237
HPDinterval(m4$VCV)
1234567891011121314151617181920212223242526272829
library(MCMCglmm)priors = list(R = list(V = 260, n = 0.002),G = list(G1 = list(V = 260, n = 0.002),G2 = list(V = 260, n = 0.002)))&m4 = MCMCglmm(bden ~ 1,random = ~ Block + Family,family = 'gaussian',data = a,prior = priors,verbose = FALSE,pr = TRUE,burnin = 10000,nitt = 20000,thin = 10)&plot(mcmc.list(m4$VCV))&autocorr(m4$VCV)&posterior.mode(m4$VCV)#&&&&Block&&&&Family&&&& units#126.671 542.42237&HPDinterval(m4$VCV)#&&&&&&&&&& lower&&&&upper#Block&& 33.33#Family&&26.48#units&&479.24
The first difference is that we have to specify priors for the coefficients that we would like to estimate (by default fixed effects, the overall intercept for our example, start with a zero mean and very large variance: 106). The phenotypic variance for our response is around 780, which I split into equal parts for Block, Family and Residuals. For each random effect we have provided our prior for the variance (V) and a degree of belief on our prior (n).
In addition to the model equation, name of the data set and prior information we need to specify things like the number of iterations in the chain (nitt), how many we are discarding for the initial burnin period (burnin), and how many values we are keeping (thin, every ten). Besides the pretty plot of the posterior distributions (see previous figure) they can be summarized using the posterior mode and high probability densities.
One of the neat things we can do is to painlessly create functions of variance components and get their posterior mode and credible interval. For example, the heritability (or degree of additive genetic control) can be estimated in this trial with full-sib families using the following formula:
\( hat{h^2} = frac{2 sigma_F^2}{sigma_F^2 + sigma_B^2 + sigma_R^2} \)
h2 = 2 * m4$VCV[, 'Family']/(m4$VCV[, 'Family'] +
m4$VCV[, 'Block'] + m4$VCV[, 'units'])
posterior.mode(h2)
#0.1887017
HPDinterval(h2)
#var1 0..3414216
#attr(,"Probability")
123456789101112
h2 = 2 * m4$VCV[, 'Family']/(m4$VCV[, 'Family'] +m4$VCV[, 'Block'] + m4$VCV[, 'units'])plot(h2)&posterior.mode(h2)#0.1887017&HPDinterval(h2)#&&&&&&&&&&lower&&&& upper#var1 0..3414216#attr(,"Probability")#[1] 0.95
There are some differences on the final results between ASReml-R/lme4 and MCMC however, the gammas (ratios of variance component/error variance) for the posterior distributions are very similar, and the estimated heritabilities are almost identical (~0.19 vs ~0.21). Overall, MCMCglmm is a very interesting package that covers a lot of ground. It pays to read the reference manual and vignettes, which go into a lot of detail on how to specify different types of models. I will present the MCMCglmm bivariate analyses in another post.
P.S. There are several other ways that we could fit this model in R using a Bayesian approach: it is possible to call WinBugs or
(in Linux and OS X) from R, or we could have used . More on this in future posts.

我要回帖

更多关于 数据缺失值处理 的文章

 

随机推荐