Instructions for setting up Azure cloud
Instructions for setting up Azure
For assignments HA1, HA2, and your final project, you will need a GPU-enabled machine. This guide will show you how to set up and connect to an Azure cloud machine with GPU, referred to as a virtual machine.
You will always be interacting with your virtual machine in Azure's servers via a terminal, or terminal emulator. Hence, it's important to know a few commands Links to an external site. to be able to perform simple tasks (like changing directories, copying files, moving files, etc).
1. Registering for Azure Students
- Navigate to https://azure.microsoft.com/en-us/free/students/ Links to an external site. and click the green Activate now button.
- Create a Microsoft account and follow the steps for activating your free credits. Use your institutional e-mail (@net.chalmers.se, @student.chalmers.se, @chalmers.se or @student.gu.se) for the registration.
- Once everything is approved you'll get an e-mail confirmation. Now you're ready to follow the next steps in this guide.
2. Creating an instance
- Note: As soon as the instance is created it will be started, already consuming your credits. Make sure you follow this step until the end to stop it after creation.
- Navigate to https://portal.azure.com/ Links to an external site. and login.
- Under Azure Services, click on Virtual machines, you'll be redirected to a new page.
- Click on Add > Virtual Machine.
- Don't change anything in this page except for the following:
- In Resource group, click on Create new. Name it
MyRG
and click on ok. - In Virtual machine name give your instance an appropriate name, e.g. "
dml-instance
". - In Region choose one of the recommended regions.
- Note Some regions might not have GPUs available, if that's the case try another region. Here's a list of all the regions that (at the time of writing this) have the GPU we'll use: East US; South Central US; Australia East; Southeast Asia; North Europe; UK South.
- In Image, click on See all images. Search for
ngc
in the search box, and click on the result with the nameNVIDIA GPU-optimized Image for AI & HPC - v20.06.3
. - In Size, click on Select all sizes. Then, search for
NC6
, and choose the result with the nameNC6
, then click the blue button on the bottom named Select. (There are other GPU-enabled machines to choose from, but we have only tested running the assignments for NC6 and NV6. You can find all GPU options by filtering on Family: GPU.) - In Username, select a username for logging in to your instance.
- Note: Write this down, we'll use it later for connecting to the instance. If you forget it you won't be able to login to your instance.
- In Resource group, click on Create new. Name it
- Click on the blue button on the bottom of the page named Review + Create, and wait for the final validation to pass.
- Fill in the required fields (name, e-mail, phone number) and accept the terms associated with the Marketplace offerings.
- Click on the blue button on the bottom of the page named Create.
- A pop-up will appear requesting you to generate new key pairs. Click on Download private key and create resource. A
*.pem
file will be downloaded, keep it stored in a safe place, we'll use it to connect to the instance later (if you misplace this file you won't be able to connect to your instance anymore). - You will be taken to a webpage that says Deployment is in progress. Wait for it to finish, when it will then exhibit the message Your deployment is complete
- Click on the blue button Go to resource.
- Click on the button with a square on its left, named Stop. A message will show up regarding IP addresses, click on ok.
- Now click on Networking in your instance's Properties tab. Then, click on Add inbound port rule. Under Destination port ranges, type
8888
. Under Name, typeJUPYTER_PORT
. Finally, click on Add. - Congratulations, you have now created your instance! You also stopped it, so that it doesn't consume credits while you're not using it. Now keep following the next steps, in order to connect to it, and set up all the required software.
3. Connecting to the instance
-
Navigate to https://portal.azure.com/ Links to an external site. and login.
-
Under Azure Services, click on Virtual machines, you'll be redirected to a new page.
-
If you followed step 1 of this guide correctly, you should now see your newly-created instance in this page. Click on its name.
-
Click on Start and wait until the instance has started; the Start button will be greyed out when this happens, and the Status of the instance will be shown as Running.
- Note: As soon as the instance starts it will be consuming credits. This only halts when you stop it. Take this into account when following the next steps.
-
In the Properties part of the webpage you're currently in, you'll find your instance's Public IP address, under Networking. Write it down somewhere, we'll use it in the next steps.
- Note: Sometimes it takes a few seconds for the IP of a recently-started machine to show up in the webpage. If that's the case, refresh your webpage after starting the instance.
-
Now we're going to open the browser-based Shell that Azure provides, so we can login to the instance. Click on the shell icon on the blue bar at the top of your screen. It is a square with the symbols
>_
inside it (and when you hover over it it says Cloud Shell). -
You'll be prompted to create some storage for the shell, accept it.
-
The bottom part of the webpage will now become a terminal, where you can type commands for the next steps of this guide. The rest of this guide assumes you will use the Bash version of the terminal (by default, this is the one you start with), instead of the Powershell version.
-
Drag-and-drop the
*.pem
that you downloaded when creating this instance into the terminal. A message will appear on the bottom-right telling you that the upload was successful. -
In the terminal, type the following commands:
-
chmod 400 <filename>.pem
(substitute
<filename>
with the name of your*.pem
file). -
ssh -i <filename>.pem <user>@<public-ip>
(substitute
<user>
with the username you chose when creating the instance, and<public-ip>
with the Public IP address mentioned previously in this guide). -
Note: If after running the
ssh
command nothing happens for a while, make sure you typed the correct IP address for your instance. If not, useCtrl+c
to cancel the current command and type in the correct address. Also note that even if the instance is marked as "running" after starting up, it might take a little while before you can actually connect to it.
-
-
You will now see a message stating that you're connected to the instance, and the green text before the cursor in the terminal will say
<username>@<instance-name>
(your values, of course), instead of<username>@Azure
. -
Congratulations, you have now connected to your instance! Any commands you run here will be performed by your instance, and the output will be displayed in this terminal.
-
When you're done using your instance, you can type
exit
in the terminal to disconnect from it. Note that this does not stop the instance. To do so, and stop consuming credits, you have to manually click on the Stop button for your instance. Make sure the instance is stopped by waiting until the Stop button is greyed out, and the Start button becomes available again.-
Important note: Forgetting to stop your instance is a common mistake that will cost you many (if not all) credits! Be mindful of this and always click on the stop button after using the instance, and wait for the confirmation that the instance was truly stopped
-
Important note 2: There are a couple of gotchas regarding stopping the instance, i.e. you might thing you have stopped it but you actually haven't. This includes running
sudo poweroff
or using the Azure CLI to stop the instance, without deallocating it. No matter how you choose to stop the virtual machine, make sure that its state goes to Stopped (deallocated). -
Important note 3: Leaving the browser based shell in a semi-active state (like training networks) for a long time can cause the shell to terminate the VM connection. We don't have a good fix for this but using a proper shell on your computer, rather than the browser one, will be more stable.
-
4. Setting up all the required software in your instance
- Copy the setup script: cloud_setup.zip to your instance. This can be done in multiple ways, e.g., through the
scp
command. It's similar to thessh
command but will instead copy files between computers over thessh
protocol. Use it like this:scp -i <filename>.pem <path_to_cloud_setup.zip> <user>@<public-ip>:
-
Connect to your instance, like described previously. If the zip file is on the instance, you should now be able to run the commands:
unzip ~/cloud_setup.zip bash configure_cloud_machine.sh
When prompted whether to proceed, simply press enter. This will install all the required software for the course, and will create a conda environment for you. It may take a few minutes, specially the part where conda is installing pip dependencies.
-
-
Note: If you just created your instance, you might run into the error
Unable to acquire dpkg frontend lock
. If that's the case, just wait a few minutes and try the same command again, the instance is still updating some packages in the background. -
Note 2: Another possible error states that
dpkg was interrupted
. If you run into that, execute the commandsudo dpkg --configure -a
and re-run the previous command.
-
-
Once everything is installed, disconnect from the instance with the command
exit
and then connect to it again like described previously. Note: This is important for the setup, so that the terminal is reinitialized; don't skip this step. Note 2: This command does not stop your instance, it just disconnects you from it while leaving it running in the background.
-
You are now ready to start a Jupyter notebook in the instance and connect to it by following the instructions here.