Homework 5 HT2023
- Due 8 Nov 2023 by 17:00
- Points 1
- Submitting a file upload
- Available 27 Oct 2023 at 9:00 - 31 Jan 2024 at 17:00
HOMEWORK 5 - Statistics and probability
Due no later than Wednesday Nov 8 at 17.00
Since this seminar is scheduled already in the first week of the study period, the procedure for handing in this week's homework is different from normal.
This homework assignment can be started by working in groups during seminar 5, and handed in after the seminar (at any time before seminar 6). The written hand-in should be individual, so that you demonstrate your own understanding of the assignment, i.e., no copying of text is allowed. You can however make use of mathematical or numerical results the group has found during the seminar.
1. Probability
a. You are given three identical boxes, one that contains two gold coins, one that contains two silver coins, and one that contains one of each. You choose one of the boxes at random, and then pick one coin from that box at random. This turns out to be a gold coin. It is not put back into the box. What is then the probability that the other coin in the box is a gold coin?
b. In Sweden, there are about 300 breast cancer cases per 100,000 individuals for women of 50 years of age. Assume that the probability of a correct diagnosis in a mammography X-ray screening is 80%, and that the probability of a false positive result (i.e., a person without breast cancer being diagnosed as having breast cancer) is 9%. These numbers approximately agree with reality according to published studies. Suppose that a 50 year old woman goes to mammography screening, and the result shows that she has breast cancer. What is the probability that she actually has breast cancer?
2. Confidence intervals and statistical significance
a. An experiment that was carried out 100 times resulted in the following measured values of a certain quantity:
9.217235 10.318129 10.129548 8.694245 10.137817 9.425283 10.623405 9.342076 10.596094 9.774860
8.574891 9.879593 9.484296 9.344638 11.595789 9.888536 8.534009 10.095894 9.727538 10.337148
11.513322 12.206648 9.835393 9.971442 11.740494 10.854234 11.598732 5.296376 9.997404 9.523829
6.931937 7.442440 10.203614 9.893644 11.493547 8.684547 10.701225 9.096643 7.403547 9.524160
8.109332 10.635259 11.017585 8.025185 10.639323 10.194740 8.547801 10.524075 9.794782 9.662731
7.679089 10.003210 9.096323 9.838539 12.397197 8.770399 9.376120 7.370233 9.570937 11.546209
12.983894 10.918312 7.947315 10.478530 11.097465 7.428212 8.378904 11.149247 10.528399 9.773401
8.077814 9.921014 6.700658 9.606119 10.412562 11.177284 9.889142 7.052527 10.233331 10.504087
13.257061 10.507389 10.405352 11.081850 10.093603 9.478749 10.430384 11.066278 10.938973 7.646291
13.077798 9.414972 10.945849 10.651533 8.235178 9.775766 12.374642 10.335392 9.919154 10.562624
How would you numerically present the resulting experimental mean value in an article (i.e., so that both the result and its accuracy is indicated, the latter preferably in terms of a confidence interval)? A numerical answer is sufficient, you do not need to create a diagram in this exercise.
Motivate your answer.
b. The same experiment was repeated by another research group, who obtained the following 100 values:
10.329801 9.663788 9.197355 6.201325 10.664198 11.154959 9.021815 11.153069 9.881691 6.578968
7.156940 6.975508 6.312702 10.660096 12.157871 10.067693 9.381362 10.262361 10.281788 9.814017
11.204465 8.792269 9.328487 8.838741 10.732438 10.834142 9.474563 7.384828 6.553554 4.691474
10.869922 8.264926 9.610700 10.103980 11.544642 8.592157 10.061099 7.096357 9.311878 6.376473
9.615128 14.317692 11.042854 9.041580 8.506643 8.420273 9.750081 10.300563 13.874328 7.863532
10.787244 9.466067 6.197738 10.246936 11.708249 8.405613 12.180735 9.300543 10.387406 10.621370
12.731710 11.075102 8.709873 7.608089 10.989466 8.872121 9.352832 11.005217 9.475105 7.786765
10.134454 10.035387 5.607347 9.450380 9.985086 11.113034 8.006207 14.915114 12.174291 12.768685
9.950252 12.066192 10.625821 10.467335 7.545005 12.081193 11.592738 9.081246 10.736805 11.393570
8.546344 8.316007 10.077501 9.142106 10.599870 9.757105 3.903690 9.126403 6.577111 11.794643
Are the results or the mean values consistent with each other, or is there a statistically significant difference?
Express your answer in terms of a p-value. The p-value is defined as the probability of obtaining results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. You can use the definition directly by fitting an appropriate probability distribution to the data.
Motivate your answer carefully.
If you need some extra reading on these subjects, those of you who took the Probability theory and statistics course SF1924 could take a look in the relevant chapter of the book of Gunnar Blom et al. Or you could consult the rich material on hypothesis testing, statistical significance and p-values available on the web. A readable semi-popular introduction is for example given here. Links to an external site.
Note - these concepts will be discussed further in the final part of the course when we return to research methodology. Even in computer science, a large majority of the master's theses in different fields involve some form of numerical measurement or comparison. Understanding the statistical validity of one's results is then an essential part of the research project. We will also discuss the limitations of the common approach to statistical significance.
Handing in your solution
Please save your solution as a pdf file and hand it in Canvas (for grading). Depending on your seminar leader's instructions, the solution may instead be handed in as a Jupyter notebook.
Peer grading
There will be no peer grading of HW5.
Feedback from your TA
Your seminar leader will grade your submission and report the result in Canvas.
Complete means you have passed the assignment.
Incomplete means you have to hand in a revised version.
Fail means that you will have to submit a new version and attend the make-up seminar.
The Fail grade will only be applied in exceptional circumstances such as plagiarized work.