Welcome to the course
Data analysis of large data sets is increasing in importance as a new tool in many fields in science and technology. This course gives an introduction to the use of many efficient numerical algorithms used in the analysis of large amounts of data. In particular we consider algorithms for low-rank approximations, clustering and fast manipulation of structured matrices. We use mathematical and numerical tools to study problems and algorithms.
Course overview
The course is divided into three blocks, with 4-5 lectures and one homework per block. In each block there is also a series of video quizzes which are mandatory to do. Details about lectures, video quizzes, reading material and recommended exercises are found under the links below.
- Block 1: Low-rank data algorithms
- Block 2: Numerical algorithms for clustering
- Block 3: Structured matrices in data analysis
Before the course starts we recommend that you review the linear algebra summarized in Background material on linear algebra Download Background material on linear algebra and also do the quiz: Background linear algebra.
Teachers
- Olof Runborg, lectures, examiner
- Vilhelm Peterson Lithell, teaching assistant
Homework
There are three homework assignments, one for each course block. They should be done in groups of one or two persons. The exercises involve both implementations of numerical methods and some analysis. The exercise texts will be available as PDF files below.
Homeworks should be handed in before the deadlines specified below. Each homework submitted by the deadline will give you one bonus point on the exam. (The point is added to your exam score.) Incorrect, or poorly written, homeworks have do be redone, and will not yield a bonus point.
The exercises are preferably done in Matlab. They are mostly designed with this in mind and occasionally require the use of certain Matlab built-in functions. However, you can also use e.g. Julia, if you prefer. Here is a link with some assistance using Julia in the course. KTH students can install Matlab on their own computers via KTHs programvarunedladdning.
Homeworks should be submitted in Canvas under Assignment/Homework X. Both a PDF with solutions to the problems and your Matlab/Julia code should be uploaded. Before submitting, you need to sign up to one of the Homework groups, together with your group partner if your work in pairs. You find the groups under the menu People. They are called "SF2526-HWX YY" where X is the homework number and YY is the group number. Even if you work on your own you need to submit as a Canvas group. (There is one group set per exercise, but by default you will be signed up with the same group members for later exercises as for the first one, so unless you want to change groups, you will only need to sign up once.)
Note: Please submit as a group even if Canvas allows you to submit without joining a group. Otherwise it will not be possible for the teacher to provide feedback.
The homework texts and data files are found below:
Homework 1 Download Homework 1 (deadline Feb 5)
Files and links for the homework:
- Exercise 3: load_mat_hw1.m Download load_mat_hw1.m
- Exercise 5+6:
- Video editing in Matlab and Julia
- Video: testbild.mkv Download testbild.mkv and testbild_snapshots.zip Download testbild_snapshots.zip
- Video: roundabout.mkv Download roundabout.mkv and roundabout_snapshots.zip Download roundabout_snapshots.zip
- Video: market.mkv Download market.mkv and market_snapshots.zip Download market_snapshots.zip
- makevideo.m Download makevideo.m
- Exercise 7: Systematic CPU time comparisons
- Exercise 8: zalando_plot.m Download zalando_plot.m, zalando_items.mat Download zalando_items.mat, ID_col.m Download ID_col.m
Homework 2 Download Homework 2 (deadline 17/2)
Files and links for the homework:
- Exercise 2: Similarity graphs in Matlab
- Exercise 3: bengali_cleanup.mat Download bengali_cleanup.mat, bengali_map.png Download bengali_map.png
- Exercise 4: zalando_clustering.mat Download zalando_clustering.mat, zalando_plot.m Download zalando_plot.m
Homework 3 Download Homework 3 (deadline 3/3)
Files and links for the homework:
- Exercise 2: hw3_terrible_sound_with_hidden_message.ogg
- Exercise 3: bisection_tree.m Download bisection_tree.m, smatrix_skeleton.m Download smatrix_skeleton.m
- Exercise 4: hw3_zalando.mat Download hw3_zalando.mat, zalando_plot.m Download zalando_plot.m, hw3_zalando_direct.m Download hw3_zalando_direct.m, hw3_zalando_conv.m Download hw3_zalando_conv.m
Note:
- Bonus points expire after the re-exam and cannot be used next year.
- If you have not completed the homework one week after the exam, you will have to do the non-completed homeworks next year/round
OBS! Each group should solve the problems independently. Both group members should contribute equally to the work and understand all parts of the submitted solution. Although you can discuss issues with other groups, all code and reports must be written by yourself, from scratch (and e.g. not based on a stub given to you). In particular, it is strictly forbidden to copy code from or to share code files with another group.
Note also the general ethical approach at KTH:
- All members of a group are responsible for the group's work.
- In any assessment, every student shall honestly disclose any help received and sources used.
- Every student shall be able to present and answer questions about the entire assignment and solution.
Course literature
The literature consists of a number of lecture notes and extracts from books. For reading instructions, see the course overview links above.
- Background material on linear algebra Download Background material on linear algebra, by Elias Jarlebring
- Block 1
- Lecture notes on low-rank data algorithms Download Lecture notes on low-rank data algorithms, by Elias Jarlebring
- Lecture notes on fast algorithms for big data Download Lecture notes on fast algorithms for big data, by Per-Gunnar Martinsson
- Extract about SVD, Download Extract about SVD, from the book Matrix Computations, by Gene Golub and Charles van Loan
- Block 2
- Lecture notes on numerics for graphs and clustering Download Lecture notes on numerics for graphs and clustering, by Elias Jarlebring
- Tutorial on spectral clustering Download Tutorial on spectral clustering, by Ulrike von Luxburg
- Block 3
- Lecture notes on structured algorithms for structured matrices Download Lecture notes on structured algorithms for structured matrices, by Elias Jarlebring
- Extract about Toeplitz matrices Download Extract about Toeplitz matrices, from the book Matrix Computations, by Gene Golub and Charles van Loan
- Extract about fast algorithms for rank-structured matrices Download Extract about fast algorithms for rank-structured matrices, from the book Fast direct solvers for elliptic PDEs, by Per-Gunnar Martinsson
- Extract about the fast Fourier transform Download Extract about the fast Fourier transform, from the book Numerical methods in matrix computations, by Åke Björck
Other material
Applications of low-rank approximations
- Netflix challenge: What movies should we recommend? What are good movie categories? Links to an external site.
- Medicine: Does this tissue stem from a cancer or a benign liver? Links to an external site.
- Forensic science: Given a finger print, what is the age and gender of the person? Links to an external site.
- Forensic science: Are any parts of this image tampered with? Links to an external site.
- Archeology: Which of these finds in burial grounds come from the same time period?
Download Which of these finds in burial grounds come from the same time period?
Nielsen, K.H. & Jensen, C.K. (1997). Burial data and correspondence analysis. In: C.K. Jensen & K.H. Nielsen (eds), Burial & Society. The chronological and social analysis of archaeological burial data. Aarhus University Press. ISBN 87-7288-686-2. Pp. 29-61. (Here is a free online analysis tool for archeologists. Links to an external site.) - Natural language processing: Is this email spam?
Download Is this email spam?
Gansterer, W.N., Janecek, A.G.K., Neumayer, R. (2008). Spam Filtering Based on Latent Semantic Indexing. In: Berry, M.W., Castellanos, M. (eds) Survey of Text Mining II. Springer, London. Pp 165-183. - SVD based image processing Links to an external site.
- SVD discussion on Quora Links to an external site.
- SVD/PCA discussion on Stackoverflow Links to an external site.
Examination
To finish the course you need to:
- Complete Homework 1,2,3 not later than one week after the exam (corresponds to ladok LABA).
- Complete all video quizzes (see course overview links above) not later than one week after the exam (the CANVAS-deadlines are recommendations).
- Pass the exam
Old exams
- 2019, March Download 2019, March
- 2020, March - solutions Download 2020, March - solutions
- 2020, June - solutions Download 2020, June - solutions
- 2021, March - solutions Download 2021, March - solutions
- 2022, March Download 2022, March
- 2022, March - solutions Download 2022, March - solutions
- 2022, June Download 2022, June
- 2023, March Download 2023, March
- 2023, June Download 2023, June
- 2024, March Download 2024, March
- 2024, March - solutions Download 2024, March - solutions