Problem set

Problem set

Author

Joseph Mhango

Published

2024-09-13

Problem set

  1. Edit the following code so that the plot accurately reflects the axis labels and the subset functions returns an object with only the contents intended (as shown in the preceding comment to the function). Using indexing and regular expression tools in base R, extend the code to calculate the proportions of manual and automatic shift cars in the study
data(mpg)

# Scatter plot with base R
plot(mpg$disp1, mpg$hwy, col = as.factor(mpg$Year), 
     xlab = "Displacement", ylab = "Highway Miles per Gallon", pch = 19)
legend("topright", legend = unique(mpg$Year), col = unique(as.factor(mpg$Year)), pch = 19)

# Use base R to filter the data where cyl equals 8
subset(mpg, cy1 = '8')
  1. Write pseudocode steps for calculating the volume of a cylinder (hint, if you do not know it by heart, you may need to research the equation for the volume of a cylinder!). For a cylinder of height = 3.2 cm and end radius of 5.5 cm, report the volume in cm to 2 decimal points of accuracy. Use at least 3 decimal points of accuracy for pi (hint, the quantity named pi is a standard variable in R!)

  2. In your own words, what value is required for the d argument in the pwr.t.test() function in the {pwr} package? Show the code involved including any appropriate comment code required to answer this question. (hint: you will probably need to install the package, load it, and use help() on the function name)

  3. Using the code chunk below, please answer the following questions:

  • What is the role of the set.seeed() function in the context of this code?
  • Why does calculating the mean of the copied original matrix return a numeric value while the mean of the same matrix when missing values are filled from the list returns NA?
  • Please fix the code such that mean_of_original_copy and mean_matrix are equal
# Step 1: Create a 3x3 matrix with random numbers, and introduce NAs
set.seed(123) 
matrix_with_na <- matrix(sample(c(NA, 1:9), 9, replace = TRUE), nrow = 3)
copy_of_original<-matrix_with_na
matrix_with_na[1,2] <- NA  # Introduce 3 specific NAs
matrix_with_na[2,3] <- NA
matrix_with_na[3,1] <- NA


print("Original Matrix with NAs:")
print(matrix_with_na)

# Step 2: Create a list with the same structure but no NAs
list_with_values <- as.list(as.character(matrix(1:9, nrow = 3)))
print(list_with_values)

# Step 3: Use a for-loop to find the corresponding missing number from the list and fill the matrix
for (i in 1:3) {
  for (j in 1:3) {
    if (is.na(matrix_with_na[i,j])) {
      matrix_with_na[i,j] <- list_with_values[[(i-1)*3 + j]]  # Replace with corresponding value from list
    }
  }
}

print(matrix_with_na)

# Step 4: Try calculating the mean of the matrices
mean_matrix <- mean(matrix_with_na)
mean_of_original_copy <- mean(copy_of_original)
print(mean_of_original_copy)
print(mean_matrix)
  1. Explain why this function is not working and return the working equivalent. Interpret what the estimate means and rank the species
# Load necessary package
install.packages('emmeans')
library(emmeans)

run_anova_posthoc <- function() {
  # Perform ANOVA on the iris dataset
  anova_result <- aov(Sepal.Length ~ Species, data = iris)
  
  # Perform post-hoc test using emmeans
  posthoc_result <- emmeans(anova_result, pairwise ~ Species)
  
  comparisons <- as.data.frame(posthoc_result$contrasts)
  
  print(comparisons)
  comparisons_df <- data.frame(Species = comparisons$species,
                               estimate = comparisons$estimate,
                               p.value = comparisons$p.value)
  
  return(comparisons_df)
}


run_anova_posthoc()
  1. Using the iris dataset in R, conduct an anova of sepal length ‘by hand’ and report the F statistic. Feel free to use other packages for data wrangling/grouping only, but for each function used, appropriately comment on what it does and which package it comes from. Show your R code.

  2. Write your own question similar in spirit, difficulty and scope to questions 3-6 above. Provide a full question including any code prompts. Also supply a full answer including any commented code required.

  3. Write your own question similar in spirit, difficulty and scope to questions 3-6 above but involving the creation nested lists and indexing of elements from within the deepest nest. the list must have more than 2 levels of nesting and no names. Provide a full question including any code prompts. Also supply a full answer including any commented code required.