STA6246 - DOE

ANOVA Examples

Author

Dr. Cohen

Etching Process Experiment

The goal of this experiment is to compare 4 Radio Frequency (RF) power to obtain the desired etching rate.

Data description

library(tidyverse)
library(gtsummary)
# enter ER
Etch_Rate=c(575,542,530,539,570,565,593,590,579,610,600,651,610,637,629,725,700,715,685,710)
# enter RF
RF=gl(n = 4, k = 5, labels = c("160","180","200","220"))
# put together data in a tibble
etch_exp=tibble(Etch_Rate,RF)

etch_exp %>%
  ggplot(aes(x=RF,y=Etch_Rate)) +
  geom_point()+
  theme_bw()+
  labs(x="Radio Frequency Power (W)",
       y="Etch Rate (unit)")

etch_exp %>%
  tbl_summary(by=RF)
Characteristic 160, N = 51 180, N = 51 200, N = 51 220, N = 51
Etch_Rate 542 (539, 570) 590 (579, 593) 629 (610, 637) 710 (700, 715)
1 Median (IQR)
  • The etching rate seems to increase with the RF power.

  • The variability seems to be similar among the groups (powers)

ANOVA in R

Fitting
library(sjPlot)

# ANOVA one factor
results = aov(Etch_Rate ~ RF,data = etch_exp)
summary(results)
            Df Sum Sq Mean Sq F value   Pr(>F)    
RF           3  66871   22290    66.8 2.88e-09 ***
Residuals   16   5339     334                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tab_model(results)
  Etch_Rate
Predictors p
RF <0.001
Residuals
Observations 20
R2 / R2 adjusted 0.926 / 0.912
# Individual CIs - need to add the CI for the reference level
confint(results)
                2.5 %    97.5 %
(Intercept) 533.88153 568.51847
RF180        11.70798  60.69202
RF200        49.70798  98.69202
RF220       131.30798 180.29202

The ANOVA was rejected (p-value \(< 0.001\)). Therefore, the data is compatible with the means difference in the population. Next, we will diagnose the model.

Model Adequacy
par(mfrow=c(2,2))
plot(results)

The residuals plots do not show any major departures from the ANOVA assumptions.

Multiple Comparison

Considering 5% level of significance.

# Multiple Comparison
pairwise.t.test(Etch_Rate,RF,p.adjust.method = "bonferroni")

    Pairwise comparisons using t tests with pooled SD 

data:  Etch_Rate and RF 

    160     180     200    
180 0.038   -       -      
200 5.1e-05 0.028   -      
220 2.2e-09 1.0e-07 1.6e-05

P value adjustment method: bonferroni 
#Tukey Test 
THSD=TukeyHSD(results)
plot(THSD,las=1)

# Fisher LSD Test
library(agricolae)
L=LSD.test(results,"RF",console=TRUE,group = F)

Study: results ~ "RF"

LSD t Test for Etch_Rate 

Mean Square Error:  333.7 

RF,  means and individual ( 95 %) CI

    Etch_Rate      std r      LCL      UCL Min Max
160     551.2 20.01749 5 533.8815 568.5185 530 575
180     587.4 16.74216 5 570.0815 604.7185 565 610
200     625.4 20.52559 5 608.0815 642.7185 600 651
220     707.0 15.24795 5 689.6815 724.3185 685 725

Alpha: 0.05 ; DF Error: 16
Critical Value of t: 2.119905 

Comparison between treatments means

          difference pvalue signif.        LCL        UCL
160 - 180      -36.2 0.0064      **  -60.69202  -11.70798
160 - 200      -74.2 0.0000     ***  -98.69202  -49.70798
160 - 220     -155.8 0.0000     *** -180.29202 -131.30798
180 - 200      -38.0 0.0046      **  -62.49202  -13.50798
180 - 220     -119.6 0.0000     *** -144.09202  -95.10798
200 - 220      -81.6 0.0000     *** -106.09202  -57.10798

Conclusion:

The Radio Frequency Power setting does affect the etching rate in the etching process. The higher the RF power the higher the etch rate.

Peak Discharge Data

An Engineer is interested in determining whether 4 different methods of estimating flood flow frequency produce equivalent estimates of peak discharge when applied to the same watershed. Six replications were run. Therefore. we have 6*4= 24 runs.

pd=c(0.34,0.12,1.23,0.7,1.75,0.12,0.91,2.94,2.14,2.36,2.86,4.55,6.31,8.37,9.75,6.09,9.82,7.24,17.15,11.82,10.95,17.20,14.35,16.82)

methods=gl(n = 4, k = 6, length = 24, labels = c("a","b","c","d"))

peak_discharge_exp = tibble(pd,methods)

Data description

peak_discharge_exp %>%
  ggplot(aes(x=methods,y=pd)) +
  geom_point()+
  theme_bw()+
  labs(x="Methods",
       y="Peak Discharge")

peak_discharge_exp %>%
  tbl_summary(by=methods,
              statistic = list(pd ~ "{mean} ({sd}); [{min}, {max}]") )
Characteristic a, N = 61 b, N = 61 c, N = 61 d, N = 61
pd 0.7 (0.7); [0.1, 1.8] 2.6 (1.2); [0.9, 4.6] 7.9 (1.6); [6.1, 9.8] 14.7 (2.8); [11.0, 17.2]
1 Mean (SD); [Range]
  • The Method “a” yields to the higher peak discharge values compared to other methods

  • The variability seems to be a bit differnet among the groups (methods)

ANOVA in R

Fitting
library(sjPlot)

# ANOVA one factor
results = aov(pd ~ methods,data = peak_discharge_exp)
summary(results)
            Df Sum Sq Mean Sq F value   Pr(>F)    
methods      3  708.3   236.1   76.07 4.11e-11 ***
Residuals   20   62.1     3.1                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tab_model(results)
  pd
Predictors p
methods <0.001
Residuals
Observations 24
R2 / R2 adjusted 0.919 / 0.907
# Individual CIs - need to add the CI for the reference level
confint(results)
                 2.5 %    97.5 %
(Intercept) -0.7903608  2.210361
methodsb    -0.2051640  4.038497
methodsc     5.0981694  9.341831
methodsd    11.8831694 16.126831

The ANOVA was rejected (p-value \(< 0.001\)). Therefore, the data is compatible with the means difference in the population. Next, we will diagnose the model.

Model Adequacy
par(mfrow=c(2,2))
plot(results)

The residuals plots do not show any major departures from the ANOVA assumptions, except the variance constant assumption.

Cox-Box Transformation for stabilizing variance
library(car)
PT=powerTransform(results)

peak_discharge_exp = peak_discharge_exp %>%
  mutate(pdT= 2*(sqrt(pd)-1)) 


# ANOVA one factor
resultsT = aov(pdT ~ methods,data = peak_discharge_exp)
summary(resultsT)
            Df Sum Sq Mean Sq F value  Pr(>F)    
methods      3 130.74   43.58   81.05 2.3e-11 ***
Residuals   20  10.75    0.54                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tab_model(resultsT,results)
  pdT pd
Predictors p p
methods <0.001 <0.001
Residuals
Observations 24 24
R2 / R2 adjusted 0.924 / 0.913 0.919 / 0.907
# residuals plots
par(mfrow=c(2,2))
plot(resultsT)

The p-value <0.001 so the ANOVA null hypothesis is rejected. Data supports evidence of peak discharge mean difference among the methods.

Multiple Comparison

Considering 5% level of significance.

# Multiple Comparison
pairwise.t.test(pd,methods,p.adjust.method = "bonferroni")

    Pairwise comparisons using t tests with pooled SD 

data:  pd and methods 

  a       b       c      
b 0.44489 -       -      
c 4.2e-06 0.00025 -      
d 6.9e-11 9.7e-10 1.0e-05

P value adjustment method: bonferroni 
#Tukey Test 
THSD=TukeyHSD(results)
plot(THSD,las=1)

# Fisher LSD Test
library(agricolae)
L=LSD.test(results,"methods",console=TRUE,group = F)

Study: results ~ "methods"

LSD t Test for pd 

Mean Square Error:  3.104054 

methods,  means and individual ( 95 %) CI

         pd      std r        LCL       UCL   Min   Max
a  0.710000 0.661090 6 -0.7903608  2.210361  0.12  1.75
b  2.626667 1.192202 6  1.1263058  4.127027  0.91  4.55
c  7.930000 1.647070 6  6.4296392  9.430361  6.09  9.82
d 14.715000 2.800891 6 13.2146392 16.215361 10.95 17.20

Alpha: 0.05 ; DF Error: 20
Critical Value of t: 2.085963 

Comparison between treatments means

      difference pvalue signif.        LCL        UCL
a - b  -1.916667 0.0741       .  -4.038497   0.205164
a - c  -7.220000 0.0000     ***  -9.341831  -5.098169
a - d -14.005000 0.0000     *** -16.126831 -11.883169
b - c  -5.303333 0.0000     ***  -7.425164  -3.181503
b - d -12.088333 0.0000     *** -14.210164  -9.966503
c - d  -6.785000 0.0000     ***  -8.906831  -4.663169

Conclusion:

The Methods do affect the peak discharge values. The method “d” provides the highest peak discharge. Also, methods “a” and “b” seems to result in similar mean peak difference (p=0.266; 95%CI -0.93 to 4.76 using Tukey’s test).

Fabric Strength and Looms - Random ANOVA

A textile company weaves a fabric on a large number of looms.

The engineer suspects that in addition to the usual variation in strength within the samples of the same loom, there may be significant variations in strength between looms.

The engineer selected a=4 looms at random and n=4 replicates.

Data description

strength=c(98,97,99,96,91,90,93,92,96,95,97,95,95,96,99,98)
looms=gl(n = 4, k = 4, labels = c("lm1","lm2","lm3","lm4"))

strength_loom = tibble(strength,looms)

strength_loom %>%
  ggplot(aes(x=looms,y=strength)) +
  geom_boxplot()+
  theme_bw()+
  labs(x="Looms",
       y="Strength")

ANOVA - To test the variance component hypothesis

# to calculate the mean squares
aRE=aov(strength ~ looms, data = strength_loom)
summary(aRE)
            Df Sum Sq Mean Sq F value   Pr(>F)    
looms        3  89.19  29.729   15.68 0.000188 ***
Residuals   12  22.75   1.896                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(aRE)
                2.5 %     97.5 %
(Intercept) 96.000004 98.9999957
loomslm2    -8.121314 -3.8786858
loomslm3    -3.871314  0.3713142
loomslm4    -2.621314  1.6213142
par(mfrow=c(2,2))
plot(aRE)

The results from ANOVA indicates significant difference in the variability due to in the looms. We want to estimate variance components:

Moments Method
# Estimate the variance components - method of moments
summary(aRE)
            Df Sum Sq Mean Sq F value   Pr(>F)    
looms        3  89.19  29.729   15.68 0.000188 ***
Residuals   12  22.75   1.896                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
s2= 1.9
s2tau= (29.729 - 1.896)/4
s2y = s2 + s2tau
ICC = s2tau/s2y
MLE method
library(lme4)
# Fit the Random Effect One-way ANOVA
anovaRE <- lmer(strength ~ 1 + (1 | looms))
summary(anovaRE)
Linear mixed model fit by REML ['lmerMod']
Formula: strength ~ 1 + (1 | looms)

REML criterion at convergence: 63.2

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.38018 -0.57260 -0.04342  0.82574  1.52491 

Random effects:
 Groups   Name        Variance Std.Dev.
 looms    (Intercept) 6.958    2.638   
 Residual             1.896    1.377   
Number of obs: 16, groups:  looms, 4

Fixed effects:
            Estimate Std. Error t value
(Intercept)   95.438      1.363   70.01
# Confidence Intervals Variance Components and Overall Mean
confint(anovaRE,oldNames=FALSE)
                          2.5 %    97.5 %
sd_(Intercept)|looms  1.1293759  5.748927
sigma                 0.9672533  2.184005
(Intercept)          92.4393299 98.435668
# Estimates the random effect tau_i 
ranef(anovaRE)
$looms
    (Intercept)
lm1   1.9309741
lm2  -3.6864051
lm3   0.2925718
lm4   1.4628591

with conditional variances for "looms" 
# Check the residuals
par(mfrow = c(1, 2))
plot(anovaRE)

#ICC
library(irr)
t(matrix(strength, nrow =  4))
     [,1] [,2] [,3] [,4]
[1,]   98   97   99   96
[2,]   91   90   93   92
[3,]   96   95   97   95
[4,]   95   96   99   98
icc(t(matrix(strength, nrow =  4)))
 Single Score Intraclass Correlation

   Model: oneway 
   Type : consistency 

   Subjects = 4 
     Raters = 4 
     ICC(1) = 0.786

 F-Test, H0: r0 = 0 ; H1: r0 > 0 
    F(3,12) = 15.7 , p = 0.000188 

 95%-Confidence Interval for ICC Population Values:
  0.385 < ICC < 0.982

The conclusion: The strength variability is significantly different from loom to loom (p <0.001). The variance component are:

  • The variance due to the looms is 6.958

  • The variance of the error: 1.896

  • The ICC is 0.786 (95% CI [0.39 to 0.98])

We can conclude that the 78.6% of variability in the strength is due to the difference among the looms.

Cardiovascular and Chocolate Study

The goal is to investigate the effect of consuming chocolate on cardiovascular health (Plasma Antioxidants from Chocolate, Nature, 2003). 3 types of chocolates:

  • 100 g of dark chocolate : DC

  • 100 g of dark chocolate with 200mL of full fat milk: DCMK

  • 200 g of milk chocolate: MK

12 subjects. The total antioxidant capacity of their blood plasma was measured after 1 hour of consumption of one of the above level.

DC    =c(118.8 ,    122.6 , 115.6 , 113.6 , 119.5 , 115.9 , 115.8 , 115.1 , 116.9 , 115.4 , 115.6 , 107.9)

DCMK  =c(105.4 ,    101.1 , 102.7 , 97.1,   101.9 ,98.9 ,   100.0,99.8,102.6,100.9,104.5,93.5)

MC    =c(102.1 ,    105.8 , 99.6  , 102.7 ,98.8 ,   100.9 , 102.8 ,98.7 ,94.7 ,97.8 ,99.7 ,98.6)

subjects = gl(n=12,k=1,length = 36,labels = (1:12))

cardio_chocolate = tibble(DC,DCMK,MC) %>%
  pivot_longer(cols = everything(),
               names_to = "group",
               values_to = "antioxidantCap") %>%
  mutate(subjects = gl(n=12,k=3,length = 36,labels = (1:12))
)
cardio_chocolate %>%
  ggplot(aes(x=group,y=antioxidantCap))+
  geom_boxplot()+
  theme_bw()

res = aov(antioxidantCap~group+subjects,data=cardio_chocolate)
summary(res)
            Df Sum Sq Mean Sq F value   Pr(>F)    
group        2 1952.6   976.3 147.350 1.82e-13 ***
subjects    11  198.5    18.0   2.724   0.0218 *  
Residuals   22  145.8     6.6                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# variance components
library(lme4)
anovaRE <- lmer(antioxidantCap ~ group + (1 | subjects), data = cardio_chocolate)
summary(anovaRE)
Linear mixed model fit by REML ['lmerMod']
Formula: antioxidantCap ~ group + (1 | subjects)
   Data: cardio_chocolate

REML criterion at convergence: 174.5

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.90551 -0.46590 -0.04931  0.71717  1.51212 

Random effects:
 Groups   Name        Variance Std.Dev.
 subjects (Intercept) 3.808    1.951   
 Residual             6.626    2.574   
Number of obs: 36, groups:  subjects, 12

Fixed effects:
            Estimate Std. Error t value
(Intercept) 116.0583     0.9324  124.47
groupDCMK   -15.3583     1.0509  -14.62
groupMC     -15.8750     1.0509  -15.11

Correlation of Fixed Effects:
          (Intr) grDCMK
groupDCMK -0.563       
groupMC   -0.563  0.500
# Confidence Intervals Variance Components and Overall Mean
confint(anovaRE,oldNames=FALSE)
                              2.5 %     97.5 %
sd_(Intercept)|subjects   0.4221091   3.448604
sigma                     1.9030082   3.367859
(Intercept)             114.2453133 117.871353
groupDCMK               -17.4119080 -13.304759
groupMC                 -17.9285746 -13.821425