Inserting pages from KTH Social as a module

In order to insert a set of saved KTH Social pages you do the following steps:

1. Follow the directions at "Uploading files from a set of saved KTH Social web pages" to put all of the necessary files into a canvas folder. As a side effect this will make a transformed_urls.json file (which will contain a mapping from the old names to the new filenames in Canvas). I name the folder with a name of the form: course_code-files (where the course_code was IK1552, resulting in a folder named IK1552-files).

2. Now you are ready to insert the previous html content into Canvas pages. This is done with the insert_page2.py Download insert_page2.py

program using the following command:

for i in *.html; do ./insert_page2.py 11 IK1552 Internetworking "$i"; done

The program will create a module named "Internetworking" in the Canvas course id = 11, then it inserts the relevant contents from each HTML page into a Canvas page (with a prefix of the course_code, in this case "IK1552-"). Using the transformed_urls.json file it looks up each new Canvas file name and inserts this file's file_id into the link. It also adds some other things to the link to make Canvas display the Preview and Download icons for each link.

The code to look up the file_id based on filename is (basically)

def get_list_of_files_for_course(course_id):
    global list_of_all_files
    list_of_all_files=[]
    url = baseUrl + '%s/files' %(course_id)

    r = requests.get(url, headers = header)

    files_response=r.json()
    for f_response in files_response:  
       list_of_all_files.append(f_response)

    # the following is needed when the response has been paginated
    # i.e., when the response is split into pieces - each returning only some of the list of modules
    # see "Handling Pagination" - Discussion created by tyler.clair@usu.edu on Apr 27, 2015, https://community.canvaslms.com/thread/1500
    while r.links['current']['url'] != r.links['last']['url']:  
       r = requests.get(r.links['next']['url'], headers=header)  
       files_response = r.json()  
       for f_response in files_response:  
           list_of_all_files.append(f_response)

def file_id_from_file_name(course_id, filename):
    global list_of_all_files
    if list_of_all_files is None:
           get_list_of_files_for_course(course_id)

    name_to_match='%s' % (filename)
    for f in list_of_all_files:
       if (f["filename"]  ==  name_to_match):
           file_id=f["id"]
           return file_id

The functions for finding all of the links and all of the URLs are:

def list_of_a_elements_with_URLS_in_page(tree, input_page):
    list_of_As=tree.xpath('//div[@class="mainContent"]//div[@class="paragraphs"]//a')
    print("list_of_<a>s_in_page:")
    for e in list_of_As:
           print(e)
           print(e.attrib['href'])
           # setting the class to "auto_open instructure_scribd_file" - will cause the file to automatically be opened!
           e.attrib['class']=" instructure_scribd_file instructure_file_link"
           print(e.attrib['class'])
    return list_of_As

# find the URLs in the page
def list_of_URLS_in_page(tree, input_page):
    list_of_URLs=tree.xpath('//div[@class="mainContent"]//div[@class="paragraphs"]//a/@href')
    print("list_of_URLS_in_page:")
    for e in list_of_URLs:
           print(e)
    return list_of_URLs

Note that the code for adding the 'class' attribute to the HTML element can be used to add other attributes. In some of the manual additions I have do I notice that there is an 'id=""' attribute, but I do not understand what it does or does not do. Similarly I am still unsure about all of the possible values that can be listed in the class attribute.

The functions for transforming the pages are:

#page_to_insert=transform_page(page_to_insert,course_id, course_code, module_name, filename)
def transform_page(page_to_insert,course_id, course_code, module_name, filename, tree):
       global transformed_urls_for_this_course
       global current_course_id

       current_course_id=course_id # to indirectly pass this information to the link replacement function

       # if there are no entries for this course, asking for these URLS  will generate a KeyError
       try:
              transformed_urls_for_this_course=transformed_urls[course_code]
       except KeyError:
              if Verbose_Flag:
                     print("no transformed URLs for course code={}".format(course_code))
              transformed_urls_for_this_course={}
              transformed_urls[course_code]=transformed_urls_for_this_course

       transformed_page=page_to_insert

       list_of_a_elements_with_URLS_in_page(tree, page_to_insert)

       for u in list_of_URLS_in_page(tree, page_to_insert):
              transformed_page.rewrite_links(link_repl_func, resolve_base_href=True, base_href=None)

       return transformed_page

def link_repl_func(link):
       global transformed_urls_for_this_course
       global current_course_id

       try:
              if transformed_urls_for_this_course[link]:
                     filename=transformed_urls_for_this_course[link]
                     file_id=file_id_from_file_name(current_course_id, filename)
                     print("filename: {} has id: {}".format(filename, file_id))
                     return "https://kth.instructure.com/courses/11/files"+str(file_id)+"/download?wrap=1"

              else:
                     return link
       except KeyError:
              if Verbose_Flag:
                     print("no transformed URLs for {}".format(link))
              return link