# Midterm Take Home Test

#### Dr DH Jones

#### 2020-10-12

**Instructions**

AssignmentTutorOnline

**Individual work**

- This is an unproctored examination in the form of a real data analysis project.
- It is expected that all your work is only your work: you may not consult for any reason with other student, staff, faculty, or live internet entities.
- You may use lecture notes, books, or internet libraries.
- You will be required to download and electronically sign the honor certificate, and then upload it to Canvas.
- Your exam will not be graded until you have met the honor certificate requirement.

**Uploading your work**

- There will be eight(8) canvas assignment slots: one for the honor certificate, and seven for the test questions.
**Therefore, in effect, you must prepare seven source files for each question with each file containing the code to load and rename the data.**- You will upload your answers to each question individually to Canvas.
- Your files must be in HTML format as generated from RStudio.
- Please do not email your answers to the professor.

**R Code**

**Show all your R code for each question that calls for coding.**- If the coding is missing, you will not receive credit for that portion of the test.

**Data**

- The dataset for analysis is
*GaltonFamilies*in the`HistData`

package using a statistical linear model. - In the 1880âs, Francis Galton, inventor of the concept of
*correlation*, assembled the dataset as part of his ground-breaking research and applications of*regression analysis*.

**R code for loading and renaming the data**

**For each question**, use the following R code to obtain and rename the data.

`# install.packages("HistData", repos = "http://cran.us.r-project.org", dependencies=TRUE) # After the first compile, you may comment out this line. library("HistData") data(GaltonFamilies) Galton2 <- data.frame(GaltonFamilies)`

- The variables are:

`names(Galton2)`

`## [1] "family" "father" "mother" "midparentHeight" ## [5] "children" "childNum" "gender" "childHeight"`

# 1 Data pre-processing (8 points)

- Obtain the summary of the data
*GaltonFamilies*. - Are there any data that should be coded missing?
- Which variables are numeric, integer, or factor?
- What is the R command for obtaining the levels of a factor?
- Use this command to determine the levels of
*gender*. - Are the labels sufficiently informational?
- Remove the
*family*and*childNum*columns. - Produce the summary table of the modified dataframe.

# 2 Correlation plots and Scatterplots (8 points)

- Obtain the correlation matrix of all the numeric and integers variables.
- Obtain the correlation plot of all the numeric and integer variables.
- Obtain the scatterplot matrix of all the variables in Galton2 with
*gender*the first variable and*childHeight*variable as the output variable. - Which variables look like potential predictors of
*childHeight*? - Which pairs of predictors look redundant?
- Obtain the scatterplot
*childHeight*vs*midparentHeight*with color of points according to*gender*. - Add to this plot, title = âOriginal Galton Dataâ, and subtitle = âScatterplotâ.
- Add to this plot,
*loess*regression lines for each gender group.

# 3 Interaction Model (8 points)

- Fit an the interaction model g1:
`childHeight ~ gender + midparentHeight + gender:midparentHeight`

. - Using title âResidual Plotâ, obtain the scatterplot of the
*residuals*vs*fitted values*. Donât print it, save it in`p1`

. - Using title âResidual Plotâ, obtain the scatterplot of the
*residuals*vs*midparentHeight*. Donât print it, save it in`p2`

. - Using title âBoxplotâ, obtain the boxplot of the residuals vs gender. Donât print it, save it in
`p3`

. - Using title âQQ-plotâ, obtain the QQ-plot, with a red qq-line, of the residuals. Donât print it, save it in
`p4`

. - Using a 2×2 grid, plot all four plots using gridExtra.
- What patterns do the above plots reveal if any?
- Obtain the coefficients, standard error of estimate, t-value of estimate, and p-value of estimate for model g1.

# 4 Main Effects Model (3 points)

- Fit the main effects model g2:
`childHeight ~ gender + midparentHeight`

. - Obtain the coefficients, standard error of estimate, t-value of estimate, and p-value of estimate for model g2.
- Interpret the value of the coefficient for
*gender*.

# 5 Constant Model (3 points)

- Fit the constant model g0:
`childHeight ~ 1`

. - Obtain the coefficients, standard error of estimate, t-value of estimate, and p-value of estimate for model g0.
- Interpret the value of the coefficient.

# 6 Comparing models *g.sm* versus *g.big*. (13 points)

- Fit the following model for
*childHeight*,*g.big*: (childHight = beta_0 + beta_1gender + beta_2father + beta_3mother + beta_4children). What are the estimated coefficients, standard errors of estimate, t-values of estimate, and p-values of estimate? - Fit the following model for
*childHeight*,*g.sm*: (childHight = beta_0 + beta_1gender + beta_2father). What are the estimated coefficients, standard errors of estimate, t-values of estimate, and p-values of estimate? - For the test of the model
*g.sm*vs the model*g.big*, in terms of the beta coefficients, what are the null and alternative hypotheses for this statistical test? - Compute the
*Analysis of Variance Table*for this test based on the data. - Using (alpha = 0.001), based on the p-value, what is the decision rule and conclusion of the hypotheses test of (g.sm) versus (g.big)?
- Compute the fit plot for the
*g.big*with the following specifications:- 45 degree line in red
- title âFit Plotâ
- y-axis label âChild Heightâ
- x-axis label âFitted Valuesâ

- What is the Pearson correlation between the
*g.big fitted-values*and*actual-values*, - Is the correlation strong, moderate or weak?
- What does this indicate for the model?
- What is the (R^2) for the
*g.big*model? - What does (R^2) mean for the fitted model?
- Theoretically, what is the relation between the (R^2) and the Pearson correlation (actual y-values vs fitted-values)?
- Show that this relation holds for the computed values of the
`g.big`

?

- Assignment status: Already Solved By Our Experts
*(USA, AUS, UK & CA PhD. Writers)***CLICK HERE TO GET A PROFESSIONAL WRITER TO WORK ON THIS PAPER AND OTHER SIMILAR PAPERS, GET A NON PLAGIARIZED PAPER FROM OUR EXPERTS**

**NO PLAGIARISM**– CUSTOM PAPER