A complete solution guide for STAT 350 Assignment 1. This download includes a comprehensive statistical report covering Welch’s t-tests, Paired t-tests, and ANOVA, along with a step-by-step RStudio walkthrough for setting up the environment and executing the analysis scripts.
Problem 1: High School and Family Income
A school district has decided that the number of students attending their high school is nearly unmanageable, so they are planning to split the high school into two zones, with Zone A students going to the old high school and Zone B students going to a newly constructed building. Some parents, however, have raised concerns with how the two zones are planned to be constructed relative to income levels. The school board conducted a study to determine whether persons in Zone A have a different mean income from those in Zone B. Random samples of 36 households in each zone are taken, and the annual income for each household was collected. The data, recorded in thousands of dollars, is included in the HouseholdIncome.txt file.
a) Construct boxplots to illustrate household income by zone. Report the boxplot.
Rguroo: Select Plots → Create Plot → Boxplot
Select Dataset: HouseholdIncome
Select Income in Numerical Variables and Zone in Factor Variables
Give an appropriate title in the Title: box.
R: Type in the following code. Select HouseholdIncome.txt when prompted. Give it an appropriate title in place of title.
income <- read.table(file.choose(),header=T)
attach(income)
boxplot(Income~Zone,main=’title’)
b) Based on the boxplot, do you think the mean household incomes will differ by zone? Do you think the variation in income will differ by zone? Explain your answers.
c) State the null and alternative hypotheses the school board should test.
d) Perform the hypothesis test from c) in software. Report your output.
Rguroo: Analytics → Analysis → Mean Inference → One & Two Population
Select Dataset: HouseholdIncome
Click the second circle in the first box
Enter Income for Variable and Zone for By Factor
Enter A for the Population 1 Level and B for the Population 2 Level
Select the Population 1-2 tab
Select the Test of Hypothesis tab
Check the t-Statistic box
Choose the proper inequality and enter 0 next to Alternative Hyp
R: Type in the following code. Replace insert with two.sided, less, or greater to match your alternative hypothesis.
t.test(Income~Zone,alternative=’insert’)
e) Use the results from part d) to make a decision regarding your hypothesis statements from part c). Be sure to identify the correct p-value. Make a decision in terms of the problem using a 0.01 significance level.
f) Suppose you wish to determine if the variability in incomes differ between the two zones. State the null and alternative hypotheses you should use to test this.
g) Perform the hypothesis test from f) in software. Report your output.
Rguroo: Analytics → Analysis → Variance Inference → Two Population
Dataset: HouseholdIncome
Click the second circle in the first box
Enter Income for Variable and Zone for By Factor
Select A for Pop 1 Level and B for Pop 2 Level
Select the Test of Hypothesis tab
Check the F-Statistic box
Select the != next to Alternativ Hyp and enter 1
R: Type in the following code.
var.test(Income~Zone)
h) Use the results from part g) to make a decision regarding your hypothesis statements from part e). Be sure to identify the correct p-value. Make a decision in terms of the problem using a 0.01 significance level.
Problem 2: Mechanic Estimates
Insurance adjusters are concerned about the high estimates they are receiving for auto repairs from Smith Auto Repair, especially when compared to the estimates they are receiving from Casey Automotive. To verify their suspicions, they send 20 cars recently involved in an accident to both garages for separate estimates of repair costs. The estimates (in dollars) from the two mechanics are given in the data sets AutoRepair.txt.
a) The insurance adjusters will be conducting a matched pairs procedure. Explain why this is the appropriate procedure.
b) State the null and alternative hypothesis statements for a test to determine if the average repair price is greater for Smith Auto Repair than for Casey Automotive.
c) Conduct the hypothesis test from b) using Rguroo or R. Report all output provided by your program of choice.
Rguroo: Analytics → Analysis → Mean Inference → One & Two Population
Dataset: AutoRepair. Select Smith for Variable 1 and Casey for Variable 2
Select the Population 1-2 tab
Select the Test of Hypothesis tab
Choose the proper inequality and enter 0 next to Alternative Hyp.
Check the Normal Probability Plot, t-Statistic, and Paired Data Boxes
R: Type in the following code. Select the Bowls.txt data set when prompted. Replace insert with two.sided, less, or greater to match your alternative hypothesis.
repair <- read.table(file.choose(),header=T)
t.test(repair$Smith,repair$Casey,paired=T,alternative=’insert’)
qqnorm(repair$Smith-repair$Casey)
qqline(repair$Smith-repair$Casey)
d) Are the conditions necessary to perform the hypothesis test from b) met? Justify your answer.
e) Report the p-value for the hypothesis test, and use it to make a decision to reject or fail to reject the null hypothesis. Use a significance level of 0.05.
f) Make a conclusion (in terms of the problem.
g) It is always possible that you make an error when performing a hypothesis test. Based on your results, which error is possible here – a Type I error or a Type II error? What would be a consequence of this type of error in the context of the problem?
h) Report the 95% confidence interval for the true mean difference in repair price between the two garages (Note: you may need to perform a new analysis – use the instructions in part c) as a starting point). Is 0 contained in this interval? What does this tell you?
Problem 3: Earnings by Degree Level
It is generally assumed that the higher the degree you earn, the higher an individual’s earnings will be. In the year 2000, a researcher collected random samples of 50 American adults in each of five categories of higher educational achievement: Some higher education (Some); Associate degree (Associate); Bachelor’s degree (Bachelor); Master’s degree (Master); and Doctorate (Doctorate). Use a significance level of 0.05 for any test. Data are included in the EarningsEd.csv file on Blackboard.
a) Identify the type of experimental design employed in one sentence. Can we define this study as balanced?
b) What is the response variable in this study?
c) What is the factor and its levels in this study?
d) Produce a boxplot on Earnings grouped by Degree.
e) State the null and alternative hypotheses to determine if the mean earnings differ by highest degree earned.
f) Run the ANOVA in software. Report the ANOVA table.
Rguroo: Analytics → Analysis → ANOVA
Select the EarningsEd dataset
Enter Earnings for the Response
Enter Degree for the Factor
Click the box next to Diagnostics
Click the box next to Post-hoc Test
R: Enter the following code. Select the EarningsEd.csv file when prompted.
Earn <- read.csv(file.choose())
Mod <- aov(Earnings~Degree,data=Earn)
summary(Mod)
g) Use the results from f) to make a decision regarding your hypothesis statements in e). Be sure to identify the p-value.
h) Do you need to perform a multiple comparison procedure? Why or why not?
i) Regardless of your answer to h), perform Tukey’s HSD. Provide the output and state which pairs of means show significant differences.
Rguroo: This should have been produced in part f).
R: Enter the following code.
TukeyHSD(Mod)
j) Comment on the normality and constant variance assumptions using graphics. Include the graphics you reference with your answers.
Rguroo: These should have been produced in part f).
R: Enter the following code.
par(mfrow=c(2,2))
plot(Mod)
k) Perform Levene’s test to determine if the constant variance assumption holds. Report the p-value. Is the constant variance assumption adequate?
Rguroo: This should have been produced in part f).
R: Enter the following code. If the car package is not installed, it will prompt you to select a CRAN mirror. You can choose any.
install.packages(‘car’)
library(car)
leveneTest(Earnings~Degree,data=Earn)
l) In one sentence, state whether you felt it was appropriate to run the ANOVA test, including your reasoning why or why not.
Problem 4: Thermal Pane Heat Loss
An experiment is conducted to investigate the heat loss for three different designs of commercial thermal panes. In order to obtain results that would be applicable throughout most regions of the United States, a researcher decided to evaluate the panes at five different temperatures: 0°F, 20°F, 40°F, 60°F, and 80°F. A sample of 15 panes of each design was obtained. Three panes of each design were randomly assigned to each of the five exterior temperature settings. The interior temperature of the test was controlled at 70°F for all five exterior temperatures. The heat losses (in W) associated with the five pane designs are given in the data set Panes.csv. Use a significance level of 0.05 for all tests.
a) What are the main factors of this experiment?
b) What is the response variable?
c) Identify the levels of each factor.
d) Run the ANOVA test using software. Present the ANOVA table.
Rguroo: Analytics → Analysis → ANOVA
Select the Panes dataset
Click the Two-Way tab
Select Hours as the Response
Select Design as Factor A and Temperature as Factory B
Click the Interaction box
Click the Post-hoc Test box
R: Enter the following code. Select the BatteryDevice.csv file when prompted.
Pane <- read.csv(file.choose())
Mod2 <- aov(HeatLoss~Design*Temperature,data=Pane)
summary(Mod2)
e) Run the first hypothesis test to determine if all the treatment means are equal. Since this p-value is not directly on the output, state the hypotheses and show your work to obtain the test statistic and the p-value. Draw a conclusion in context.
f) Test to see if there is any interaction present. State the interaction hypotheses and provide the p-value of the F test statistic. Draw a conclusion in context.
g) Test each main effect hypothesis (no matter what the result is in part f)). State each hypothesis and provide the p-values. Draw conclusions and interpret your results.
h) Produce the Tukey HSD output and make note of the pairs of means that are significantly different. Remember, your results in (f and g) will determine which pairs to analyze.