Thursday, October 29, 2015

Assignment 3
Significance Testing

Part 1:

2. A Department of Agriculture and Live Stock Development organization in Kenya estimate that yields in a certain district should approach the following amounts in metric tons (averages based on data from the whole country) per hectare: groundnuts. 0.5; cassava, 3.70; and beans, 0.30.  A survey of 100 farmers had the following results: 

       μ             σ
               Ground Nuts     0.40        1.07
               Cassava             3.4          1.42
                   Beans                    0.33          0.14
Ground Nuts
     - Null Hypothesis: there is no difference between the yields of ground nuts and the estimated yield. 
     - Alternative Hypothesis: there is a difference between the yields of ground nuts and the estimated yield. 
     - A Z test was conducted to determine whether we would reject or fail to reject the null hypothesis.
     - There was a significance level of 95%, but since it was a two tailed test, we would use + or - 1.96. 
     - The Z value we got was -0.934. This means that we would fail to reject the null hypothesis or that there is not a difference. 
Cassava
     -Null Hypothesis: there is no difference between the yields.
     -Alternative Hypothesis: there is a difference between the yields.
     - A Z test was conducted to determine whether we would reject or fail to reject.
     - There once again was a significance level of 95%, leaving us with + or - 1.96.
     - The Z value we got was -2.11. Since this was less than -1.96 it would not fall within the 95% leaving us to reject the null hypothesis while stating there is a difference in the yields. 
Beans
     -Null Hypothesis: there is no difference between the yields.
     -Alternative Hypothesis: there is a difference between the yields.
     - Once again a Z test was conducted to determine whether we would reject or fail to reject.
     - The Z value we came up with was 21.42. This lead us to reject the null hypothesis since it fell outside the +1.96 range. 

3. An exhaustive survey of all users of a wilderness park taken in 1960 revealed that the average number of persons per party was 2.8.  In a random sample of 25 parties in 1985, the average was 3.7 persons with a standard deviation of 1.45 (one tailed test, 95% Con. Level) 

     -Null Hypothesis: there is no difference between the two parties.
     -Alternative Hypothesis: there is a difference between the two parties.
     -A T-test was conducted to determine if there was or wasn't a difference.
     -There is a significance level of 95% as a one tailed test.
     -The T value we came up with was 1.711 leaving us to reject the null hypothesis and that there is a difference between the two parties. 

Part 2:
    In the second part of our assignment we were given the task to look at data regarding the northern and southern halves of Wisconsin. Now when thinking of the term "up north", many people have different ideas. It is hard to think what a person from Florida would consider the term "up north" to be. Personally, when thinking of this term, I think of big woods, wolves, and lots of snow and cold for northern Wisconsin. Many people from the state would have different ideas as well that would associate a difference, because there is one, between northern and southern Wisconsin. 
    When dividing the state into two halves, a common parameter is highway 29, that runs east to west across the state at relatively the halfway point. This is the dividing line that I used in this assignment.  This left me with 27 northern counties and 45 southern counties. 

    Upon dividing the state into two halves, we were asked to look at SCORP data collected from the Wisconsin DNR. This data provided a number of characteristics that were unique to the state of Wisconsin. Some of this data reflected the term of "up north" while others pertained to the entire state as a whole. We were asked to choose three sets of this data and map it, which would show the areas throughout the state there were higher in these sets. The three sets of data I chose were the number of beaches, number of picnic areas, and the number of cottages. 
    The first map I made was the number of inland beaches. The minimum number of beaches in a county was one with the max coming in at 26 beaches. At first I figured that the higher number of beaches would all be in the northern half of the state considering that there is more lakes up north. Upon making my map I came to the conclusion that this was half right. Up north did have a lot more lakes and beaches but not the most beaches for a county. 
    The next map I chose to make was the number of picnic areas per county. The minimum number of picnic areas was 1 with the max coming in at 301. Instantly I thought about the University of Wisconsin Madison when talking about picnic areas. I figured this area would have the most considering all of the college aged students living there. I also knew that much of the northern half would be very low in picnic areas due to the fact of early winters. If it was campgrounds, then yes the northern half would have much more in my opinion. After creating the map, as I predicted, Dane county which features the University of Madison was one of the highest counties with picnic areas. 
    The final map I created with the term "up north" in mind, was the number of cottages. Cottages goes hand an hand with this term in my mind. Whenever growing up and my parents would talk about going to the cottage I instantly thought about lakes and going up north to grandmas. I figured before making my map of the number of cottages was that most of these cottages would be located in the northern half of the state. After making the map I stood correct. Looking at the data I found it very intriguing that  some of the counties had upwards of 12,500 cottages in their county. This seemed like a lot for a county, but shows that people are still willing to travel to northern Wisconsin during the summer and fall times to keep these cities and towns alive with tourism. 
Part 4:
    The final part of this assignment dealt with computing Chi-Square in SPSS. SPSS was new to many of us including myself. Chi-Square testing gives a numeric value for each variable comparing the observed distribution of each variable with the expected distribution. It also provides a statistical measure of how the observed variables are distributed throughout the state in respect to the expected distribution, with a significance level of 95 percent. 
    After computing the Chi square for inland beaches, with the number falling outside the 95 percent significance level, it is clear to state the number of beaches correlates with the northern and southern halves. Since there is more lakes in the northern half, it is safe to say that there would be more beaches as well in the northern half. 
    Upon completing chi square for picnic areas, with the number as well falling outside the 95 percent significance level as well as looking at the map, it is easy to see that picnic areas correlates with the southern half of the state. As I predicted earlier with Dane county being one of two counties with the most picnic areas. 
    The last chi square I conducted dealt with the number of cottages. This number as well fell outside the 95 percent significance level, which told me that it had a direct influence from the northern half of the state. More lakes in the northern half, leads to more cottages on these lakes. The two counties with the most cottages as well were in the northern half of the state. 
    

Wednesday, October 7, 2015

Assignment 2
Z-Scores, Mean Center, and Standard Distance
 
 
    In assignment two we are looking at disorderly conducts in Eau Claire Wisconsin, mainly geared towards the hopping bar scene on Water Street area. I was given the addresses of all Disorderly Conduct violations around the city of Eau Claire in 2003 and 2009. Along with the violations and addresses, I was also given the number of arrests at each particular address. Although I was not given the reasons for these crimes, most related to fights and loud music, I still was able to analyze them spatially. I am interested in seeing how these patterns have changed over time. I was also given the addresses of bars in 2009, looking at the bars I want to see how many arrests took place at these addresses. The main question here, are the complaints coming from citizens warranted?
    Part 2
    In part two of the assignment we are looking at the mean centers and the weighted mean centers. The first process of completing this task was to upload the disorderly conduct arrests from 2003 around Eau Claire. By using the mean center tool in arctoolbox, I am able to quickly find the mean center for these arrests in 2003. Upon finding the mean center, I next wanted to find the weighted mean center for 2003 arrests. This tool was also in the arctoolbox, but for the weighted field I chose count. This would show the number of arrests at the given addresses. When building the map for 2003 I also used a graduated symbols map with natural breaks on the map allowing for me to be able to show the different number arrests for a given location.
    After finding the mean center and weighted mean center for 2003, I turned my focus towards 2009. Since I already found the mean center and weighted mean center in 2003 I was able to quickly compute these for 2009. Right now I have two maps, one for 2003 and one for 2009 for arrests from those years with the mean and weighted mean centers. For my third map I combined all of this data onto one map to be able to show the differences from 2003 and 2009. When looking at the third map you can see exactly how the mean and weighted mean has shifted slightly based on the addresses and number of arrests at these locations.
 

  
 Part B
    The next maps I wanted to created dealt with standard distances. I wanted to find the standard distance of arrests for 03 and 09 to one standard deviation. One standard deviation allows for 68% of the arrests to fall within that area. Along with the standard distances I also wanted to include the weighted mean centers to show where it fell inside the standard distances. The standard distances tool was located in the arctoolbox. My input feature class was the arrests for each year. After computing this tool I was able to see exactly where the concentration of the arrests occurred for the given year. I wasn't surprised when I saw that these arrests fell within a few blocks of Water Street. After completing the standard distances for 03 and 09, I wanted to make a map showing how they compared with each other. In my observation of the maps, it is easy to see that not had changed from 03 to 09. The standard distance shifted slightly but not much.
Part 3 Z-Scores
    The last part of this assignment dealt with calculating Z-scores for the Eau Claire Block Groups. When looking at the block group properties I was concentrated on the Join_Count column. This is the number of arrests in Eau Claire for 2009. Next I needed to find the mean and standard deviation for the block groups. I was able to find this information by looking under quantities in the symbology tab. Under the quantities tab I was able to find the mean and standard deviation. The mean was 5.4 and standard deviation was 7.8. I wanted to find the Z-scores of just three block groups, 57, 46, and 41.
    First I will talk about block group 57. The observation or number of arrests in this block group was 40. To find the Z-score I had to take the observation minus the mean then divide that by the standard deviation.
Z-score= 1-5.4/ 7.8    Z-score= -.5641
Since my observation of arrests was only 1, this would be considered an outlier, and fall in the third standard deviation.
    Block group 46 had an observation of 40, or 40 arrests in that block group that year. This is a very high number as it fell right by Water Street. Again I used the same mean and standard deviation.
Z-score= 40-5.4/7.8     Z-score= 4.435
With the Z-score being so high, it would fall in the first standard deviation covering 68% of all arrests in 2009.
    Block group 41 had an observation of 10, or 10 arrests in that block group in 2009. This is not that high of a number, yet these still are not considered outliers. I used the same mean and standard deviation numbers to compute this Z-score.
Z-score= 10-5.4/7.8     Z-score= .5897
The numbers in this Z-score would have fallen in the second standard deviation. The final map I wanted to create shows you the different block groups and the standard deviations based on the arrests for 2009. It also shows where the bars are located showing you that where the higher concentration of bars are, the higher the standard deviation is. As I would have guessed, the higher standard deviations fell on Water Street or close too.
    After I created all the maps it was easy to see where the majority of the arrests took place, and if the complaining from residents of the community was warranted. Just by looking at the arrests from 03 and 09 you can see that the concentration of arrests was on or near Water Street. I figured this was the case as Water Street has a high concentration of bars and college students that lose there heads after a few drinks. When looking at my third map of comparing the 03 and 09 arrests, it is hard to find a pattern as to where these arrests took place. They are scattered between Water Street and the old downtown bars by the new confluence project. Although not as many drunk college kids go to the downtown bars, there is still plenty of arrests. I believe that it is more then safe to say that alcohol plays a role with a majority of these arrests from both 03 and 09.
     When comparing the standard distances in my fourth and fifth maps, you can see that the bulk of the arrests fall within the first standard deviation circle. These again are between Water Street and the downtown bars. Looking at the sixth map of having both standard distance circles on it, you can see that the standard distances of arrests shifted ever so slightly. This small shift could be from just one house party between the two years.
    My seventh and final map looked at arrests for the block groups based on standard deviations. Comparing my seventh map to maps one and two, this backs up the reasoning why the standard deviations for Water Street and downtown are so high. This is were the majority of the arrests took place.
     After finishing all of my maps and having them laid out I do not see a reason for many of the residents of Eau Claire to complain about the ruckus the college students cause. Yes, fights and disorderly conduct is bad, but when you live in the third ward of Eau Claire which is predominately college age kids you have to know that this would occur. The people who have the right to complain in my opinion are the ones who live outside the third ward and downtown. Now there is not many solutions for these people who live in the areas of high arrests, because you can't really just move that easily. A good solution would be to come to an agreement with the neighboring college aged kids on how late you would like them to party if they don't want the police called. Most of the time if there is a fight the cops would be called no matter what however. Seeing how the trend of arrests didn't vary much from 03 to 09, I can almost bet that these stats will be fairly the same today or five years from now.