DD2352 VT20 (60319)
Lab 1 - Spelling (programming)
Hoppa över till innehåll
Översikt
  • Logga in
  • Översikt
  • Kalender
  • Inkorg
  • Historik
  • Hjälp
Stäng
  • Min översikt
  • DD2352 VT20 (60319)
  • Uppgifter
  • Lab 1 - Spelling (programming)
  • Startsida
  • Uppgifter
  • Sidor
  • Filer
  • Kursöversikt
  • Quiz
  • Media Gallery
  • Course Evaluation

Lab 1 - Spelling (programming)

  • Inlämningsdatum 13 feb 2020 av 10:00
  • Poäng 1
  • Tillgänglig efter 16 jan 2020 kl 8:00

Note that being accepted by Kattis is not enough to pass the lab, your code has to be approved by a lab assistant too. You also need to register for the course on Kattis Links to an external site. before presenting the lab.

Lab 1: Spelling

Lab 1 is a mandatory part of the course.  If you present the lab by February 13, you get a bonus point on the exam, but you can also present it later without getting the bonus point.

Additionally, on the lab session on February 7, you have the opportunity to present your solutions to the theory problems below. Correct solutions to the theory problems give a bonus point on the exam but it is not mandatory to present them.

In the course folder Lab1 there is a Java program (consisting of two files) that solves the problem described below. The given Java program does solve the problem, but it takes hours to get the answer. You should optimize the program so it finds the answer within the time limit given by Kattis.

Below there are several theory problems that give suggestions on how to improve the program. Your optimized programs should have the same form of input and output as the given program and it must still be in Java. It is optional to solve the theory questions , but thinking about them will be helpful for solving the lab.

Accuracy and efficiency is tested by sending your solution to the Kattis Links to an external site. system. To complete the lab your solution must be approved by Kattis and a lab assistant. Start by logging into Kattis and enroll in the algokomp20 course Links to an external site.in the menu option Courses in the top menu.  The problem is called kth.adk.spelling Links to an external site. on Kattis (The problem text is in Swedish but says the same thing as the text above.)

Edit Distance

The Edit Distance between two words is the minimum number of operations required to transform one word to the other. There are three permitted operations:

  1. remove one of the letters in the word
  2. add any letter anywhere in the word
  3. replace any letter in the word by any other letter from the alphabet

For example, the word alroithm can be transformed to algorithm by changing the letter r to g (rule 3) and inserting the letter r after the letter o (rule 2). The chain

alroithm -> algoithm -> algorithm

shows that the edit distance between alroitm and algoritm is at most 2. Furthermore it is impossible to transform alroithm to algorithm with a single operation, and therefore the edit distance is exactly 2.

A common way to develop spelling suggestions for a misspelled word is simply to return the words in the dictionary that has the smallest edit distance to the misspelled word. The program in this lab is given a glossary and a number of misspelled words as input and is supposed to compute spelling suggestions in this way.

Input/Output Specification

Input data consists of two parts. The first part is the glossary, consisting of a number of words in the Swedish alphabet, one word per line. This part ends with a line that contains only one '#' character. The second part is a number of misspelled words to be corrected, one word per line. The misspelled words are not included in the glossary. Each word in the input consists only of lowercase letters in Swedish alphabet (a-z, å, ä, ö), no spaces, punctuation or numbers, and the input is encoded in UTF-8 Links to an external site..

The dictionary has no more than half a million words and the number of misspelled words in the input is no more than 20.

The program shall, for each misspelled word, print a line consisting of the misspelled word, followed by the minimum edit distance in parentheses followed by a list of all the words in the dictionary that has this minimum edit distance to the misspelled word. The list should be in alphabetical order and each word in the list should be preceded by a space.

Example

A few example input files (and the expected answers) are provided in the course folder Lab1/Examples. You can test run your program from a terminal at the EECS Linux machines by running (after having compiled the program)

$ java Main < testcase1.input
kål (1) skål
k (3) skål
b (4) skål
sklfrm (2) skålform
skala (2) skål skålar

Note that the provided examples are not as large as the real test data that will be used by Kattis - your programs will have to solve all these files in a fraction of a second in order to be fast enough.  You are encouraged to construct more comprehensive test data yourself.

Theory problems

Try to understand how the given program works and answer the following questions.

  1. Formulate the recursion (partDist in the program) as compact as possible with mathematical notation.
  2. Calculate partDist("labd", "blad", x, y) for all x and y between 0 and 4 and enter the results in a matrix M. What is M?
  3. What does the method partDist(w1, w2, x, y) compute?
  4. Show that the time complexity of Distance(w1, w2) is exponential in the word length.
  5. See how you can track the editing operations made in the shortest edit sequence from "labd" to "blad" by looking at the matrix M.
  6. Show with pseudocode how the recursion can be calculated by using dynamic programming, ie how a matrix M can be created.
  7. Analyze the time complexity for determining the editing distance between an n-letter word and an m-letter word with dynamic programming.
  8. Calculate the dynprog matrix for the distance between "labs" and "blad".
  9. What part of the matrices for "labd" - "blad" and "labs" - "blad" are different?
  10. Generally, what parts of the matrices for Y-X and Z-X are different when the words Y and Z have the same first p letters?
1581584400 02/13/2020 10:00am
Inkludera en beskrivning
Ytterligare kommentarer:
Maxresultat för gradering till > poäng
Inkludera en bedömningstitel

Matris

 
 
 
 
 
 
 
     
Det går inte att ändra en matris efter att du börjat använda den.  
Hitta en matris
Hitta matris
Inkludera en titel
Titel
Du har redan bedömt studenter med den här matrisen. Större ändringar kan påverka resultaten för deras uppgifter.
Titel
Kriterier Bedömningar Poäng
Redigera beskrivning av kriterium Ta bort kriterium rad
Det här kriteriet är länkat till ett lärandemål Beskrivning av kriterium
tröskel: 5 poäng
Redigera ranking Radera ranking
5 till >0 poäng
Full poäng
blank
Redigera ranking Radera ranking
0 till >0 poäng
Inga poäng
blank_2
Det här området kommer användas av utvärderaren för kommentarer relaterade till det här kriteriet.
poäng
  / 5 poäng
--
Ytterligare kommentarer
Redigera beskrivning av kriterium Ta bort kriterium rad
Det här kriteriet är länkat till ett lärandemål Beskrivning av kriterium
tröskel: 5 poäng
Redigera ranking Radera ranking
5 till >0 poäng
Full poäng
blank
Redigera ranking Radera ranking
0 till >0 poäng
Inga poäng
blank_2
Det här området kommer användas av utvärderaren för kommentarer relaterade till det här kriteriet.
poäng
  / 5 poäng
--
Ytterligare kommentarer
Poängsumma: 5 av 5