SF2935 HT21 Modern Methods of Statistical Learning (50348)

SF2935 Modern methods of statistical learning

--------------------------------------------------------

Zoom-meeting for the course: https://kth-se.zoom.us/j/69322985200

--------------------------------------------------------

Instructor & examiner: Pierre NyquistLinks to an external site..

Office: 3539 (5th floor). Email: pierren@kth.se. Phone: 08-790 7311

Teaching assistant: Jens Agerberg.

Office: Lindstedtsv. 25. Email: jensag@kth.se. Phone: —

Information related to Covid-19:

Due to the on-going pandemic KTH has imposed regulations on how courses are taught during the fall semester 2021. As a result this course will be given online, with no campus activities—this decision has been made at the department and school levels . Course activities will take place during the assigned time slots (see the schedule (Länkar till en externa sida.) (Links to an external site.)) and the exact form of these activities will be outlined at the start of the course; see below for some information about the different activities (weekly quizzes, final exam etc.).

The pandemic has taken a toll on all of us and we understand that this might be still be a very trying for many of you — you can always contact us with any concerns or issues that you may have and we will try to do our best to help and support you.

Diversity and inclusion statement:

In this class, you will be treated with respect, and we welcome students of all backgrounds, beliefs, ethnicities, genders, gender identities, gender expressions, national origins, religious affiliations, sexual orientations, ability and other visible and non-visible differences. All members of this class are expected to contribute to a respectful, welcoming and inclusive environment for every other member of the class. 

If you experience, or notice someone else experience, anything that is in conflict with this statement, please report this either directly to the teachers or anonymously (more on this at the start of the course). 

(credit to Daniel Litt (Links to an external site.) of the University of Toronto)

Support for students with disabilities

Students with disabilities may have the right to certain compensatory support, for example during examination. This is centrally coordinated at KTH by Funka (Länkar till en externa sida.); please contact Funka at funka@kth.se for information about the support they can offer. 

Course description: 

This course gives an introduction to standard methods for statistical learning and the mathematical principles underpinning these methods. The purpose is to provide students with a broad introduction to common methods for supervised and unsupervised learning and the mathematical tools used to design and analyse such methods. The course combines theory, emphasising the mathematical nature and the theory underlying statistical learning, and computational experiments to give an understanding for the building blocks of various machine learning methods now being used in all aspects of society. Note that this is a course in the Department of Mathematics, which is reflected in the focus being on mathematical principles more-so than implementations of different algorithms or specific examples of machine learning methods in the other sciences.

The following is a rough list of the general topics that will be discussed in the course: Introduction to statistical learning, half-spaces and perceptrons, linear and logistic regression, neural networks, Bayesian statistics & learning, linear methods for supervised classification, tree-based methods, support vector machines, principal component analysis, random forests, unsupervised learning, probability in high dimension, ethics in machine learning and AI.

Up-to-date information about what has been covered in lectures, updates on projects etc. can be found under "Current information".

Course literature: The course is based on a set of lecture notes that will be posted in the "Lecture notes" folder during the course. We will announce specific sources for different parts of the course as we go along, for students who are interested in going beyond the lecture notes. More advanced references will also be provided for interested students—these are by no means required reading. For an overview of the topic of the course, omitting the more technical details, the book  An introduction to Statistical Learning by G. James, D. Witten, T. Hastie, and R. Tibshirani is a good source (this was the course book in previous years). Relevant research papers will also be posted, some for historical context and others, particularly regarding ethics, will be the main reading material. 

Guest lecture:

In addition to regular lectures and problem sessions, there will be a guest lecture on data quality and ethics in machine learning. This will be given by Robert Nyquist (Links to an external site.), machine learning engineer and contributor and mentor in efforts like Women in AI. 

Project:

The course includes one larger mandatory project (see "Examination" below) to be completed in groups of 2–5 students. The task will be to classify a list of songs, using techniques from the course, based on a dataset with song features provided by Spotify and classification reflecting the musical taste of the instructor; full details of the project will be given shortly after the start of the course. For groups that want to participate, there will be a leaderboard on Canvas showing which group has achieved the best prediction, updated once a day. After a given deadline, the members of the group with the highest score will be awarded something for their effort (what this is will depend on what restrictions are in place at that time). This part of the project is completely optional and will have nothing to do with whether you get an F or a P.

 

Exercise sessions

Held on Wednesdays. See the thread for information about which exercises will be covered in the next session and other updates.

Intended learning outcomes: 

For the methods presented in the course, the student shall possess both theoretical and practical understanding of how the methods work, which ones to choose for a given problem and how to implement rudimentary versions of them. Computer-aided projects form an essential learning activity.

To pass the course the student shall be able to

  • formulate and apply methods for supervised learning,
  • formulate and apply methods for unsupervised learning,
  • apply mathematical theory to analysis and explain properties of methods in statistical learning,
  • design and implement methods in statistical learning for different tasks.

Examination: 

The course has mandatory project work (hand-in assignments) and a final written exam. The project accounts for 3.0 ECTS (graded P/F) and the written exam for the remaining 4.5 ECTS.  More information about the project will be given shortly after the start of the course. 

Because of the pandemic, there is some uncertainty regarding the form of the final exam. As it stands today, early August, the final exam will be a written exam on classical format, on  October 27, 08:00-13:00. If this is changed there will be an announcement here on Canvas and during lectures. Grades are given in the range A-F or Fx, where Fx gives the right to a complementary examination to potentially reach the grade E. Registration using MyPages is required for the exam. Please refer to Studentexpeditionen Matematik / Student Office Mathematics for any questions regarding enrolment in the course and admission for the exam.

Allowed aids during the exam: BETA handbook, an A4 page (front and back) "help sheet" with any content that you want—there are no restrictions on what you write on the two pages (front and back), but you can only bring one such sheet —and a pocket calculator (standard type allowed for math exams at KTH).

The structure of the exam is five problems with 10 points allocated to each problem. The first problem will consist of multiple questions and no derivations are required (i.e. only answers required). This is the structure used for the exams from 2019 (including the practice exam) and therefore they serve as good examples for what the structure will be like this year as well.

There will be (almost) weekly quizzes in Canvas that can generate up to 5 bonus points for the final exam. Each quiz will focus on the topics of the corresponding weeks and participation is voluntary; more information will be provided at the start of the course.

The general scheme for translating quiz results to bonus points on the final exam is as follows: Suppose that we end up with LaTeX: K quizzes and your results are LaTeX: q_1, \dots, q_K correct answers out of a possible LaTeX: c_1, \dots, c_K. Suppose also that your worst score is on the first quiz, that is LaTeX: q_1 / c_1 is the lowest fraction out of all the quizzes. Then, the bonus points are computed by first taking the average of the remaining fractions,

LaTeX: f := \frac{1}{K-1}\left( \frac{q_2}{c_2} + \dots + \frac{q_K}{c_K} \right),

multiply this average fraction by 5 and round to the nearest integer.

(Student) Office hours:

Office hours are really student office hours: they provide a chance for you, the students, to come and discuss the course material, or any other topics for that matter, in an informal setting outside of class. During that time, we are available to you to discuss what you want. As office hours are not standard at KTH, or Sweden in general, some participants might be wary of attending. This is only natural, see e.g. the following piece from NPR (Links to an external site.), or this document from a class at Cornell (Links to an external site.) for a general description of office hours (there are some US-specific things in both, but the overall messages fits just as well here at KTH). The most important thing is, to paraphrase the NPR piece: Do not be afraid to ask for help. It is not a sign of weakness, it's a sign of strength.

Depending on the restrictions that are in place, at KTH and Stockholm in general, the plan is to have both in-person and online versions each week—in-person office hours could be in an office, in a larger lecture hall or even outside for some occasions, depending on you preference, attendance etc. The exact dates and times will be decided at the start of the course; in addition to the weekly time slots, on a week-by-week basis there may be other times that are available as well by appointment—if you cannot make it to the regular time slots, let us know and we will try to find something that works.

Course evaluations:

There will be opportunities for you to provide feedback both during and after the course. For assessments during the course, we will use Canvas-based surveys to gauge things like pace, topics, level of detail, theory vs. computation etc. We will also try to have class representatives that can take feedback from the class—we can discuss different ways to organise this—and bring it to us at some predetermined interval (e.g. every two weeks) to help form the remainder of the course. We are therefore asking everyone taking the class to consider whether this is something you would be interested in doing. We will of course try to minimise the time and effort this will take on your part and provide some kind of benefit, e.g. treat the representatives to lunch (if restrictions allow) whenever we meet to discuss. 

Near the end of the course there will also be a Canvas-based survey for a full course evaluation. We encourage you all to complete this evaluation to help form both this and other courses moving forward—we read the course evaluations and take your feedback seriously; all evaluation results are anonymous. 

Email policy:

Emails from students are most welcome: please allow us at least 48 hours to respond. I always try to respond well before that, however during the pandemic I have often been unable to respond as quickly as I would like to. This means that last-minute emails about assignments or similar might go unanswered before the relevant deadline has passed—try to do things a bit in advance so that you have enough time to get your questions answered.

Disclaimer:

The course syllabus is a general and tentative plan for the course. Deviations from this plan may be necessary and will then be announced to the class by the instructor. It is the responsibility of the student to seek clarification of course requirements and procedures from the instructor.

Course summary:

Date Details Due