Wednesday, December 2, 2015

Assignment 5
Regression Analysis
Part 1:
    In part one of this assignment we were asked to run a regression analysis on data dealing with kids free lunches and crime rates going up. The data included percent of kids that get free lunch in the given areas and the crime rate per 100,000 people. A news station believed that as the number of kids that got free lunches increases, so did crime. With regression analysis in SPSS it is possible to either back up this news station or show that their information is faulty. After running the data through SPSS, the analysis came back as figure 1.
figure 1
Regression analysis evaluates the influence of the independent variable on the dependent variable. Regression analysis is proven by the equation y=a+bx, were a is the constant and b is the slope of the line or the regression coefficient. For this analysis, y=21.819+1.685x. With this equation from the analysis it is viable to see that the news station is telling the truth. With the more free lunches that are being given out to kids, the number of crimes rise.


figure 2
In figure 2 you can see the scatterplot from the data on free lunches and crime rates. You can see that it is a positive trend, which backs the argument even more for the news station about the crime rates rising with more free lunches. There is really only one large outlier that doesn't fit in the scatterplot but that can be easily ignored. With the equation from the analysis and now the scatterplot, it is very easy to back the news station.

Part 2:
    Part two asks to analyze enrollment numbers for all of the UW system schools. The data provided had all of the number of the students from the 72 counties going to each school. The data also provided percent with a Bachelor's degree for each county, income variable, and the distance each county (from its center) is from each school. The UW system wants to know why students choose the schools they are going to. Now it is basically impossible to know for certain why a students chooses a school because the possibilities are endless, but it is possible to narrow down trends within the state. With the given data about the schools I was able to create a spreadsheet with the data that I felt best showed why a student would choose a school. Along with the data offered, I also had to compute a normalization for county population and distance from a school. This equation is rather simple as it is the population for each county divided the distance from that school. My final spread sheet is figure 3.
figure 3
    In figure 3 you can see that there is much information for two schools. We were asked to use UWEC and were able to choose the other school of our liking. Since I am from Marathon county I decided to chose Stevens Point since it is close. After the calculations were computed for both UWEC and UWSP, it was time to run the spreadsheet through SPSS.
    SPSS would allow me to run regression analysis on the data for the two schools. In SPSS under the analyze tool at the top, the dropdown allowed me to select 'regression' and then 'linear'. This allowed for me to determine my dependent and independent variables. My dependent variable is always the raw number of students from each school. If I was running analysis on UWEC, the dependent variable would be 'EAU' or 'STP' for UWSP. There were three independent variables that I  needed to run, county pop/ distance normalization, percent of bachelor degrees, and median household income. In total I ended with six different sets of regression analysis, three for UWEC and three for UWSP. Now much of the analysis between the two schools are very different, but there is some of the data sets that are somewhat similar. The regression analysis on county pop./ distance normalization is relatively similar between the two schools. With the rest of the analysis to being pretty different. 
    I was then asked to map the residuals for any of the regression results that were statistically significant. There were four residuals that were significant. Eau Claire and Stevens Point pop./ distance, Eau Claire and Stevens Point bachelors degrees were my four residuals.
figure 4
    Figure 4 is the residual showing the students by county attending UWEC. You can see that most of the students come from the surrounding counties around Eau Claire along with Dunn county by Madison. 
   
figure 5
    The percent of students from each county attending UWSP however is quite different. Most of the students are from the northeastern portion of the state. Comparing figure 4 to figure 5 you could say that there is a more diversified amount of students attending UWSP compared to UWEC judging from the two maps. My final two maps are of the the Bachelor degrees from UWEC and UWSP. 
figure 6
Figure 6 is of the bachelor degrees from UWEC by county. this map is relatively similar to figure 4. most of the bachelor degrees are within the Eau Claire county area.
figure 7
Figure 7 is of the bachelor degrees from UWSP. Although most of the students attending UWSP are from the northeastern portion of the state, you can see that the bachelor degrees will stay centralized. This will lead to better jobs of employment in the central Wisconsin area. 
    The main goal of this assignment was to try to show trends of particular variables that are strong predictors of why students choose a certain school. One of the strongest variables of why a student chooses a particular school is because of the distance it is from home. This allows students to be close enough to home but yet far enough away to feel like they are on their own. There was a strong showing in the R-squared analysis for this variable. One of the weakest variables that there was was the household income variable. This had a low R-squared analysis showing that this doesn't have a strong pull on why a student chooses a certain school. I believe that struggling students are able to get grants and financial aid allowing them to choose a school of their liking especially if they are on an in state tuition. By running my table through SPSS, I was able to make maps, and determine what residuals helped to show a trend of were  students come from to go to these two schools. SPSS did the calculations for me and by cross referencing them against eachother it was easy to pick the trends.









No comments:

Post a Comment