Sunday, November 15, 2015

Assignment 4
Correlation and Spatial Autocorrection

Part 1 Correlaiton:
figure 1

figure 2
Null hypothesis: there is no linear difference between distance and sound level.
Alternative hypothesis: there is a linear difference between distance and sound level.

In part one we were asked to make a scatter plot in excel to determine if there was a negative or positive trend. We were also asked to run the data through SPSS allowing us to create a Pearson's correlation for a more in depth look at what the data was actually telling us. When looking at the scatter plot that was generated in Excel you can see that there is a negative trend being represented. The trend line is going in a downward fashion. Looking at the Pearson's correlation that was run in SPSS you can see a significance level at 0.01. When reading the correlations box, Pearson's r comes back at -.896. This is telling us that there is a negative correlation between distance and sound level. In other words we can conclude that when distance increases sound level will decrease. Since Pearson's r is so close to negative one, all the points on the scatter plot will be very close to the negative trend line which will reinforce Pearson's correlation. In this situation we would end up rejecting the null hypothesis because there is a linear difference between distance and sound level. 

2.
figure 3
    In number two of part one we were given data about the city of Detroit. Population among various races, bachelor degrees, median household income, and number of retail, manufacturing, and finance employees. We were again asked to create a correlation matrix with this data and to break it down to see if we could detect any patterns between any of the data. 
    When looking at this data you can see that there is a lot to be broken down. For the purpose of this assignment I am going to focus on strength, direction, and probability. I first want to look at the white population in Detroit. The first thing that stands out is the negative correlation between whites and the black population. The correlation is -.604. Although it isn't a very high negative correlation it however is still negative. This is telling me that wherever there is a higher white population in the city of Detroit, that there will be a rather low black population in that same area. Next I want to look at the bachelor degrees, median household income, and median home value for the white population. All three of these have positive correlations with bachelor degrees being the highest at 0.698. With all three of them having positive Pearson's r, that means that when the white population increases the bachelor degrees, household income, and home value will all increase based off of the Pearson's r. 
    I now want to talk about the black population in Detroit. When looking at the bachelor degrees, household income, and median home values, Pearson's r is all negative in connection with the black population in Detroit. Since it is a negative number, that means that there is a weak correlation between the black population and those three variables. In fact, the black population Pearson's r is negative across the entire matrix when looking at it. The black population has a weak correlation with everything in this matrix. Basically that wherever there is a higher black population in the city of Detroit, the other variable will decrease or have a negative trend associated with it. 
    When looking at the last two races in this matrix, hispanic and asian, one has a negative correlation and one has a positive correlation with everything in the matrix. First I will talk about hispanic population. The hispanic population is the race associated with the negative correlation for the city of Detroit. Although it is a negative correlation with everything across the matrix, it is barely under zero compared to the black population. The hispanic population does have a weak relationship with everything, just not as weak as the black population. 
    The final group I want to talk about is the asian population. This is the only other group of people that have a positive correlation as the white population. The highest Pearson's r associated with the asian population is dealing with bachelors degrees at 0.559. This is saying that it has a relatively strong correlation since it is above the zero mark. 
    When looking at the matrix for the city of detroit for the four different populations, only two of them have a positive correlation associated with them. The white and asian population both have positive correlations while the black and hispanic have negative correlations, with the black population having the lowest correlation of the four. 

Part 2 Spatial Autocorrelaiton:

    Part two of this assignment deals with spatial auto correlation of the Texas Election Commission (TEC). The TEC has given data in regards to the 1980 and 2012 presidential elections for the state of Texas. The data only includes the democratic votes for both elections, as well as the voter turnouts. The TEC wants the data analyzed to determine if there are any patterns in the state as well as voter turnout. One bit of data the TEC left out however was dealing with the hispanic population from the 2010 U.S. Census. This information can be downloaded and joined to the data already provided allowing for better analyzation. In the end the TEC wants a report to be able to show the governor if there has been any patterns over the 32 years between the elections. 
    To get started with the spatial auto correlation, I first had to download the Texas shape file and hispanic data from the U.S. Census website. Once I downloaded both shape files I then brought them into ArcMap. This allowed me to join my hispanic population data with voting data the TEC supplied from the 1980 and 2012 elections. Upon completion of joining my data in ArcMap, I can now use the GeoDa software to process the information. 
    The first thing I had to do in GeoDa was to open my export shape file of the Texas map and table joins. Once this is open I could then create a weight for the shape file in GeoDa. The weight allows for me to see if there is a spatial auto correlation between the 1980 and 2012 election years. To create a weight, I had to go under tools and create weight. My input file was the project I was working on (Texas). The contiguity weight I used for this assignment was called Rook Contiguity. When looking at the counties in the state of Texas, and using the Rook contiguity, this means that neighboring counties are all weighted different with how much they border each county. Say that we have all square counties and you want to look at a county in the middle of Texas. A Rook contiguity only takes into neighboring counties that are either to the north, south, east, or west. It does not take into account counties that are to the northwest, northeast, southwest, or southeast. 
    Once I had created my weight class in GeoDa it was now time to compute the weight information in with Moran's I. Moran's I is a way of measuring the degree of spatial auto correlation in data. The first data I wanted to use Moran's I with was the percent democratic vote in 1980. (pictured below)
figure 4
You can see that there is a positive trend line as well as a Moran's I value of 0.575. Since we are looking at the percent democratic vote for counties with this Moran's I, and it has a positive value of 0.575, that means that when there is a county with a high democratic presence in it that neighboring counties will also have a democratic presence in it as well. 
    The next Moran's I that I ran dealt with the percent of democratic votes in 2012 for counties. The Moran's I value I got this time was higher than the 1980 vote with 0.695. This means that the trend has continued to grow in the state of Texas since the 1980's. You can also see how tightly clustered the points are and how close they are to the center. This has a positive correlation and when a county has a high democratic vote, neighboring counties will also tend to be democratic voters. (pictured below)
figure 5
    The next Moran's I that was ran dealt with the democratic voter turnout in 1980. Although the value is lower than any of the other values so far, it still held a positive correlation in democratic voters with a value of 0.468 and a positive trend. Since we are looking at the democratic voter turnout in 1980 and the I value comes in at 0.468, that means that where there are democratic voters, other democratic voters will be associated with that area. (pictured below)
figure 6
    The final Moran's I that I ran dealt with democratic voter turnout for 2012. While comparing the Moran's I value from 1980 to 2012, I can then determine if there is a positive or negative trend over the 32 years of the presidential election. With the 2012 Moran's I value coming in at 0.335, there is a slight decrease in the trend of democratic voter turnout in the state of Texas. Although it is still a positive Moran's I, it is less then the 1980 value. (pictured below)
figure 7
    The final portion of part two for this assignment dealt with creating and analyzing Lisa Cluster Maps.  Lisa maps try to show where there is clustering or grouping of data in a map. Lisa maps calculate local Moran statistics to demonstrate local spatial auto correlation. The spatial clusters on the map refer back to the core of the clusters. The clusters are grouped together by valuing the similarity of the neighboring areas (either high or low) compared to complete randomness. They range from high high (red), high low (light red), low low (blue), and low high (light blue). With the maps I created all four colors will be pictured in our maps on showing the democratic voter turnouts for both years as well as the percent democratic vote for the counties in Texas. Now all the counties won't have a color because some of the counties are not democratic. 
    The first Lisa map I created is showing the percent democratic vote in 1980. The dark blue areas on the map show where there is low democratic votes for those counties. The light blue shows the low high area for percent democratic votes. This could mean that although it is still a light blue county that there could be a high number of democratic votes for the people in that county. As you may have guessed the red counties are high high, leaving a large percent of democratic votes for that county. The light red counties show the high low counties for democratic votes. The next task is easier after bringing in the 2012 percent democratic vote, because in the end the TEC really wants to see if there is a change in the percent democratic vote from 1980 to 2012. 
figure 8
    Like I mentioned above, after I have ran the 1980 percent democratic vote Lisa map, it was time to create the 2012 map to be able to compare to the 1980 map. I am looking to see if there has been any changes in the clusters over the 32 years to see if the percent democratic vote has shifted around in the state or if it has relatively stayed in the southern and northeastern portion of the state. As you can see from the 2012 map (below) compared to the 1980 map (above), there has been a change of the percent democratic voters. Although the south portion counties has relatively stayed the same they lost there hold in the northeastern portion of the state with it shifting the the western side of Texas. You can also see that there is more blue counties in 2012 compared to 1980. This means that the republican party is gaining traction in the state of Texas, which is bad news for the democratic party. With properly representing this information to the governor of Texas, I believe that the democratic party could find a way to gain their foothold back in the northeastern portion of the state. This could allow for the democratic party to regain their dominance in the state of Texas. 
figure 9
    The final two Lisa cluster maps I created dealt with the voter turnout for 1980 and 2012. Like with the two Lisa maps above the only way to compare them is to put them next to each other to see a difference in democratic votes by county for the state of Texas. 
figure 10
figure 11
    The first map pictured above (figure 10) that I brought in was the 1980 voter turnout map dealing with democratics. You can see that the southern portion of the state had a low voter turnout for democratic voters with a high voter turnout in the north and central clusters of the state. One reason I could think that there is a low voter turnout for the southern area, although comparing it to figure 8 that area has a high democratic presence, is that since there is such a strong democratic presence for that area that those voters believe that they don't need to cast a vote since the democratic presence is always around them. They could believe that that party will win since that area of Texas has a strong belief in the democratic party. 
    The second map I brought in (figure 11) was the democratic voter turnout for 2012. When comparing it to the 1980 map (figure 10) you can see that not a whole lot has changed over the 32 year period, besides in the northern pan handle portion of the state. There was a high voter turnout in the 1980's in the pan handle, but that seemed to lose some of the votes in 2012. This could be for a variety of reason, maybe those counties switched more to the republican side instead of staying democratic. When looking at the southern portion of the sate, there still is a low voter turnout over the 32 year span.