Regression Projects



Guidelines: The guidelines are the same as for homework: While you are welcome to work on assignments together and use resources such as books and websites to help you figure out solutions or check your work, all assignments turned in should be written by yourself, including any computer code. Further, your solutions and code should not be copied from any references or other people's work. In summary, your solutions should represent your understanding of the assignments.

Project 1: Down with seatbelts! (Due Nov 19, in class)

The year is 1985. The place is Britain. From Jan 31, 1983 a temporary law is enacted which requires all drivers and passengers of motor vehicles to wear seatbelts for 3 years. You are hired as an independent analyst to determine the efficacy of this law to determine whether it should be permanently instituted as of 1986. Proponents of the law say it reduced the number of drivers and passengers killed (going from 1,472 in 1982 to 1,228 in 1984). Detractors of the law say these numbers may not be significant as there has been a general trend of a decrease in the number of motorists killed in recent years. They say rather this law makes drivers less careful, citing increases in the number pedestrians and cyclists killed or seriously injured (from 1982 to 1984 these numbers went from 18,963 to 19,168 for pedestrians, and more significantly went from 5,967 to 6,506 for cyclists). Detractors also claim that rear seat belts are actually less safe as the number of rear seat passengers deaths increased from 297 to 372 from 1982 to 1984. Proponents of the law say that this some of these increases are due to increased numbers of people on the road.

Using the Seatbelts and UKDriverDeaths datasets in R, investigate the validity of these and other possible positive and negative effects related to the seatbelt law compared with potentially changing factors in the number of driver and passenger fatalities. With margins of error, quantify the change in the number of deaths/serious injuries (front seat, back seat and total) since the enactment of the seatbelt requirement. In addition, you should search for additional data and facts which can further test your hypotheses and conclusions. For this project, you should prepare a (typed) technical report in two parts:

  1. Methodology: a detailed explanation of your statistical methods (including code and printouts) and results of statistical tests. Include any additional data you are using.
  2. Interpretation: Citing the results from your methodology section, write a well-written, detailed report for policymakers explaining what factors [including the seatbelt law and others suggested above] contribute to motor vehicle injuries and deaths and to what extent. Conclude with a suggested course of action in regards to the seatbelt law.

This project will be graded on how detailed, well-written and convincing your report is. [Hint: policymakers like to see graphs and numbers.] Note: if you want to use any regression methods you have learned outside this class, please discuss it with me first.

Undergraduate Student (4773) Final Projects

Using the MathAchieve and MathAchSchool datasets in R, investigate the factors influencing Math Achievement scores. What are the most important factors for individual scores? Quantify how important the school enviromnent is (and which factors are important). Is there a difference in the importance of these factors for minority/nonminority or male/female groups?

Submit a report containing your analysis and conclusions, which will be graded in the same manner as Project 1, during the scheduled final exam period.

Graduate Student (5773) Final Projects

These projects are for students enrolled in MATH 5773, and will consist of an in-class presentation (about 15-20 minutes) and a written report due at the scheduled final exam period. The presentations will occur on Fri Dec 7 (2:30-3:30) and Th Dec 13 (4:30-6:30) in PHSC 1105.

Possible topics (most of which are briefly mentioned in Chapter 9) include:

  • Mixed models
  • Nonparametric regression
  • Design theory
  • Time series analysis
  • Principal component analysis
  • Factor analysis
  • Robust regression
  • Model selection
  • p >> n

Course Home