Welcome to Algebraic Statistics!
Welcome to the course in Algebraic Statistics!
This course gives an introduction to the emerging field of algebraic statistics, which focuses on using methods from algebra to develop tools for statistical inference. In statistics, we typically start with a set of parameters that we use to define a distribution. Oftentimes, the model-defining map that sends the parameter values to their distribution can be viewed a rational map. From this perspective, the set of distributions we obtain from our space of possible parameter values (i.e. the statistical model) is the solution set to a collection of polynomial equations. An understanding of these polynomial equations can then be used to develop statistical inference methods for solving fundamental problems related to point estimation, hypothesis testing, model selection and representation learning. To extract these statistical methods we will dig into the nature of these polynomial equations, utilizing methods from algebra, geometry and combinatorics. Topics to be discussed include the geometry of discrete and Gaussian exponential families, the geometry of maximum likelihood inference, algebraic hypothesis tests for hierarchical models, parameter identifiability and the geometry of conditional independence models. Applications in categorical data analysis, causal inference and phylogenetics will be explored.
Prerequisites
Knowledge obtained in a basic statistics course, linear algebra course, and discrete mathematics course is required. More advanced knowledge from courses in groups and rings would be helpful, but we will start the course with a soft introduction to the necessary tools from computational algebra.
Course literature
The main course literature is:
Sullivant, Seth. Algebraic statistics. Vol. 194. American Mathematical Society, 2023.
A helpful resource for the basics on computational algebra is:
Cox, David, et al. Ideals, varieties, and algorithms. Vol. 3. New York: Springer, 1997.
Course content
Lectures. The lectures are given in-person according to the course schedule. The first lecture takes place 17 January 2025 at 13.00 in V01. Below are the topics to be discussed at each lecture and some associated recommendations for reading from Sullivant's book. I'll also upload my handwritten notes here after each lecture.
Lecture | Date | Location | Contents | Reading | Liam's notes |
1 | 17 Jan. | V01 | statistical models, polynomials and affine space | Ch.2 | |
2 | 24 Jan. | V01 | affine varieties and ideals | Ch. 3.1, 3.2 | (pdf Download pdf) |
3 | 28 Jan. | E31 | vanishing ideals, the nullstellensatz and the ideal-variety correspondence | Ch. 3.1, 3.2 | (pdf Download pdf) |
4 | 7 Feb. | E35 | Gröbner bases | Ch. 3.3 | (pdf Download pdf) |
5 | 14 Feb. | E35 | Buchberger's algorithm, Elimination Theorem, Closure Theorem | Ch. 3.4 | (pdf Download pdf) |
6 | 18 Feb. | D32 | A solution to the implicitization problem for statistical models, exponential families | Ch 3.4, 6.1 | (pdf Download pdf) |
7 | 28 Feb. | E53 | exponential families | Ch. 6.1, 6.2 | (pdf Download pdf) |
8 | 21 Mar. | Q15 |
exponential families, likelihood inference |
Ch. 6.2, 6.3, 7.1 | (pdf Download pdf) |
9 | 25 Mar. | ZOOM Links to an external site. | likelihood inference, sufficient statistics | Ch. 7.3, 8.1, 8.2 | (pdf Download pdf) |
10 | 4 Apr. | E31 | conditional inference and Markov bases | Ch. 9.1, 9.2 | (pdf Download pdf) |
11 | 11 Apr. | D41 | graphical models | Ch. 13.1, 13.2 | (pdf Download pdf) |
12 | 15 Apr. | E53 | parameter identifiability | Ch. 16.1, 16.2 | (pdf Download pdf) |
13 | 6 May | E53 | latent variable models | Ch. 14 | (pdf Download pdf) |
14 | 9 May | Q15 | phylogenetics | Ch. 15 | (pdf Download pdf, slides Download slides) |
15 | 13 May | E35 | phylogenetics | Ch. 15 | (pdf Download pdf) |
Practice Sessions. There will be a practice session held after every three lectures. The first practice session will take place on 31 January 2025 at 13.00 in V01. This is a time for you to work on some suggested problems and get help solving them. The problem sheets will either be posted here ahead of time or given out at the session before you start working. In the latter case, the problem sheets will be posted here after the session for referencing later.
Practice Session | Date | Time | Location | Problem Sheets |
1 | 31 Jan. | 13-15 | V01 | (pdf Download pdf) |
2 | 21 Feb. | 13-15 | E31 | (pdf Download pdf) |
3 | 28 Mar. | 13-15 | D41 | (pdf Download pdf) |
4 | 29 Apr. | 13-15 | Q17 | (pdf Download pdf) |
5 | 16 May | 13-15 | D41 | (pdf Download pdf) |
Assignments
Homework assignments. By the end of the day after (most) lectures, you will be given a homework problem to solve. Each problem is due at the start of the lecture immediately following the lecture after which it was given out. You may submit your solution either by uploading it on canvas under the associated assignment or by giving it to me (Liam) at the start of the lecture. No late submissions will be accepted. Your solution to each problem will be graded both for correctness and quality of its presentation (for example, if you are proving something, the proof should be clear, concise and well-written with complete sentences explaining your arguments and any computations you include. To see good examples, have a look at some of the proofs in the course literature!). If you submit a homework assignment at the start of lecture x then you will receive your graded assignment with some written feedback at the start of lecture x + 1.
Exam. There will be exam on 28 May 2025 at 14-17. The exam will consist of mostly ''theory problems'' -- where the main task is to show that you know important definitions and results -- and a few ''problem problems'' where you apply the theory you've learned. The exam will not be designed to use the full 3 hours. However, feel free to relax and take as much of the time as you need.
Study Guide Download Study Guide for the exam.
Grades
Every problem (either homework or exam problems) will be assigned one of three symbols: √+,
√ or
√−
The more √+'s you get the more likely you are to get an A, the more
√−'s you get (or unsubmitted solutions) the closer you are to an E or F. Mainly
√'s will likely put you around a C. Other grades can be inferred naturally. The final assignment of your grade will be determined by a wholistic consideration of the quality of your efforts and work in the course. These three symbols are meant to provide you with a sense of how you are progressing. Combining this information with the written feedback on the homework assignments and keeping up your efforts is a great way to achieve your desired grade!
Contact information
In this course, you will meet:
Liam Solus solus@kth.se (examiner, teacher)
Marina Garrote-Lopéz marinagl@kth.se (teacher)
Joseph Johnson josjohn@kth.se (teaching assistant)
Course representatives: Karl Lindberg karll6@kth.se and Tuva Hirschberg tuvah@kth.se