How to use linear mixed effect models (LMMs) with cluster-based permutation test?
Scenario 1:
(I just made these examples up, i tried to keep the dimensions unique in size for identifiability)
Dependent variable (DV)
- pupil size (measured every second after onset, for 5 seconds)
Fixed effects
- background color (within subject, 2 levels)
- congruency (within subject, 2 levels)
Random effects
- participant (10 participants/levels)
Each participant completes 40 trials, 10 one of each kind in the 2x2 design.
The overall DV dataset is therefore 400x5 matrix.
---
Analysis (code below to be specific):
1) Run LMM for each data point using: pupil ~ bgcolor + congruency + bgcolor*congruency + (1|sub)
2) Collect t-values
3) Identify clusters
4) Generate null distribution by shuffling labels of bgcolor and congruency within each subject. Include permutations where no cluster was above threshold and assign them 0 cluster mass.
5) Calcualte desired threshold based on the null distribution
6) Evaluate actual clusters for each effect.
Scenario 2:
Dependent variable (DV)
- pupil size (measured every second after onset, for 5 seconds)
Fixed effects
- background color (within subject, 2 levels)
- drug/placebo (between subject, 2 levels)
Random effects
- participant (10 participants/levels)
This is the same scenario as above but one factor is now a between subject variable. Would the right randomization strategy be to shuffle bgcolor within subject and then shuffle labels for entire subjects? (e.g. if each sub has 100 sampels altogether, shuffle the entire blocks of 100)
Is there a disadvantage in shuffling ALL variables across all samples/subjects?
Back to Scenario 1
load data
load('../data/dataset1.mat', 'dv', 'iv');
[GS, Gkey, S, Skey, mle] = get_stats(iv, dv, {'bgcol', 'congruency'});
col4 = [map(2,:) ; map(12,:); map(43,:); map(47,:)];
visualize data
gme = GS(Gkey.bgcol==num2str(bg) & Gkey.congruency == num2str(cn),:,1);
gse = GS(Gkey.bgcol==num2str(bg) & Gkey.congruency == num2str(cn),:,2);
plot(1:numel(gme), gme, 'LineWidth', 2, 'Color', col4(idx,:));
shade_area_bet_curves(1:numel(gme), gme-gse, gme+gse, col4(idx,:), 0.2);
legend({'low cong - low bgcol','low cong - high bgcol', 'hi cong - low bgcol','hi cong - high bgcol'}, 'location', 'best');
ylabel('pupil size (a.u.)')
run LMM for each data point
eq = 'dv ~ bgcol*congruency + (1|id)';
e = [array2table(dv(:,i), 'VariableNames', {'dv'}) iv];
for bb = 2:length(lme.Coefficients.Estimate)
labels{bb-1} = lme.Coefficients.Name{bb};
L.E(bb-1, i) = lme.Coefficients.Estimate(bb);
L.SE(bb-1, i) = lme.Coefficients.SE(bb);
L.T(bb-1, i) = lme.Coefficients.tStat(bb);
L.P(bb-1,i) = lme.Coefficients.pValue(bb);
find clusters
sigT_true = repmat(0, 1, length(L.T(eff,:)));
sigT_true(abs(L.T(eff,:))>tcrit) = 1;
[C_sub{eff}] = clusterize(sigT_true, T_true);
disp(array2table(C_sub{eff}, 'VariableNames', {'cluster_size', 'cluster_mass', 'na', 'cl_start', 'cl_end'}));
end
cluster_size cluster_mass na cl_start cl_end
____________ ____________ __ ________ ______
4 117.83 0 1 5
cluster_size cluster_mass na cl_start cl_end
____________ ____________ __ ________ ______
3 22.926 0 1 4
cluster_size cluster_mass na cl_start cl_end
____________ ____________ __ ________ ______
2 -14.493 0 1 3
Ok, so we find one cluster for each effect. As expected there is a significant main effect of background color across all five time points, but there also seems to be a main effect of congruency (time points 1 - 4) and an interaction at time points 1 - 3.
permutation test
This is the part that I am not sure about. Intuitively, since both fixed effects are within subject I'd permute the labels within each participant. (The alternative is across all participants).
fxeff = {'bgcol', 'congruency'};
C_perm = cell(1,size(L.E,1));
% shuffle labels of all independent variables
tiv = table; %permutation-specific independent vars
for sub = 1:numel(ids) %loop over participants
chunk = iv(ismember(iv.id, ids(sub)),:);
for eff = 1:size(vars,2) % loop over vars
chunk.(vars{eff}) = chunk.(vars{eff})(randperm(length(chunk.(vars{eff}))));
e = [array2table(dv(:,i), 'VariableNames', {'dv'}) tiv];
lme = fitlme(e, eq); %TT=[];
for bb = 2:length(lme.Coefficients.Estimate)
TT(bb-1, i) = lme.Coefficients.tStat(bb);
if abs(TT(bb-1, i)) > tcrit
for var = 1:size(TT,1) % loop over regressors
%take cluster with highest cluster mass
cltemp = []; cltemp = clusterize(H(var,:), TT(var,:));
maxclust = repmat(0,1,5); %if no significnat cluster, record cluster with cluster mass 0
%otherwise take biggest cluster
maxclust = cltemp(find(abs(cltemp(:,2))==max(abs(cltemp(:,2)))),:);
C_perm{var} = [C_perm{var}; maxclust];
save('../data/dataset1_perm_data.mat');
load('../data/dataset1_perm_data.mat');
show null distributions and actual cluster mass (red)
%% Get distribution for each variable
% permutation distribution T-based cluster mass
hi_perc = (1-(alpha/2))*n;
low_val = distr(round(low_perc));
hi_val = distr(round(hi_perc));
%%% back to real data: assess each cluster
if C_true(c,2) > hi_val || C_true(c,2) < low_val
P{eff}.name = lme.Coefficients.Name{eff+1};
P{eff}.clusters = C_true;
P{eff}.perm_distribution = distr;
subplot(size(L.E,1),1, eff);
histogram(P{eff}.perm_distribution, 50); hold on;
xlabel('cluster mass distribution');
plot(repmat(P{eff}.clusters(1,2), 2, 1), [0 100], 'Color', [0.8 0.2 0.2], 'LineWidth', 2);
This doesn't seem right to me, but it could be only becuse the data are artificially generated.
plot again with significant interaction cluster(s)
gme = GS(Gkey.bgcol==num2str(bg) & Gkey.congruency == num2str(cn),:,1);
gse = GS(Gkey.bgcol==num2str(bg) & Gkey.congruency == num2str(cn),:,2);
plot(1:numel(gme), gme, 'LineWidth', 2, 'Color', col4(idx,:));
shade_area_bet_curves(1:numel(gme), gme-gse, gme+gse, col4(idx,:), 0.2);
legend({'low cong - low bgcol','low cong - high bgcol', 'hi cong - low bgcol','hi cong - high bgcol'}, 'location', 'best');
ylabel('pupil size (a.u.)')
sigCl = P{3}.clusters(P{3}.clusters(:,3)==1,:);
xo = sigCl(cl,4):0.1:sigCl(cl,5);
shade_area_bet_curves(xo, repmat(80, 1, length(xo)), repmat(120, 1, length(xo)), [0.7 0.7 0.7], 0.2);