Description Assignment 2: Perception
Objectives
The objectives of the second assignment is to learn about sensor fusion and learning-based perception. Sensor fusion is, as the name implies, the fusion of two or more sensor's data. Sensor fusion is used in order to reduce uncertainty and improve robustness. Different sensors are good at different things, with sensor fusion one is able to take advantage of the benefits of each sensor.
Learning-based perception is extensively used in autonomous driving scenarios. It is used to identify and track other vehicles, to segment the different driving lanes, to read traffic signals and signs, and to classify pedestrians to name a few.
Prerequisite
First make sure that you have done the assignments setup.
Submission
Reading material
This assignment tasks are designed to be mostly stand alone and the topics is vast so it is almost impossible to provide a concise reading guideline.
Sensor fusion
A brief, high level, introduction to sensor fusion is provide by wikipedia Links to an external site..
When working with sensor fusion we often combine a model that describes what we know about the system with sensor data. The sensor data is often assumed to be uncertain and this uncertainty is typically described by some statistics such as the standard deviation (covariance in the multi dimensional case). The system is often a dynamical system, i.e a system that can be described by differential equation. In a computer model the differential equation is often replaced by a difference equation on the form xk+1=f(xk,uk)+wk, where
xk is the vector of state variables (sometimes a scalar) we are trying to estimate, such as the position of a robot or the temperature in the center of a boiler,
uk is in input signal and
f describes the system dynamics, i.e. how the future state of the system depends on the previous state and the input signal and
wk is the so called process noise expressing our uncertainty in the model. We will get back to how to compute
uk given that we want to achieve certain states. To be able to use sensor data we need to relate the sensor data to the state of the system. This is modeled by the measurement equation
yk=h(xk)+vk, where
yk is the measurement vector and
vk is measurement noise. The latter models the uncertainty in the measurements.
If you are new to the topic the following movies provide an easily accessible introduction topic. Movies 1-3 are most relevant.
The Kalman filter is used in the "GPS aided INS" exercises, in a similar way as described in movie 3 above. The Kalman filter is covered in an easily accessible way in the following movies. Movies 1-5 are most relevant for this course.
If you find the topic interesting, you could also look into this course in Sensor Fusion offered at Linköping University. The course covers both theory and practice in sensor fusion. Most of the course material as well as some movies introducing the different concepts in the course is available from the course homepage.
During the 1st 2-day meeting you got some hands on experience with the IMU. Remember that if you did not do the exercise then this is part of the Make Up Routines.
Learning-based perception
We will leave the details of learning-based methods to the numerous courses in WASP that teach these. In this course we will instead focus on using them with the main aim to get some feeling for what is possible and roughly what goes into using them.
If you have no idea about deep neural networks we suggest that you watch for example
https://www.youtube.com/watch?v=QDX-1M5Nj7s
Links to an external site.
In task 2.3 you will look at object detection with YOLO. To get a bit of background, take a look at the following page which describes an earlier version. No need to look at the architecture for YOLOv8 Links to an external site..
https://www.jeremyjordan.me/object-detection-one-stage/ Links to an external site.
Task 2.1
In this specific case expresses the kinematic model of the robot. In the first task we will look at fusing information from two wheel encoders to estimate the position of a mobile robot. This results in something called odometry. The difference equation from before, xk+1=f(xk,uk), in this specific case is given by
xk+1=xk+v⋅Δt⋅cos(θ)yk+1=yk+v⋅Δt⋅sin(θ)θk+1=θk+ω⋅Δt
or in vector form xk+1=f1(xk,u1k), where
x=(x,y,θ)T is the robot pose,
u1k=(vk,ωk)T is the input when considering the control input to be the linear and angular velocities respectively and
Δt is the timestep.
In this assignment we will work with a differential drive robot. This robot has two powered wheels that can be controlled independently. The translation speed of each wheel is given by r⋅dφdt.
The translation and rotation speed of the robot as a function of the wheel speeds are given by
v=r2(˙φR+˙φL)=r2(dφR/dt+dφL/dt)ω=rB(˙φR−˙φL)=rB(dφR/dt−dφL/dt)
with φL and
φR being the wheel rotation angles for the right (R) and left (L) wheels.
Another way to write the equation, often more natural when working with wheel encoders is to consider the relative distance (Dk) and rotation (
Δθk)
xk+1=xk+Dk⋅cos(θ)yk+1=yk+Dk⋅sin(θ)θk+1=θk+Δθk
or in vector form xk+1=f2(xk,u2k) where
u2k=(Dk,Δθk)T.
Let ΔER and
ΔEL denote the difference in encoder reading over a period of time
ΔT
If K is the scaling factor to transform encoder difference to angle,
Δφ=KΔE
v=r2(KΔER+KΔEL)/ΔT→D=vΔT=r2(KΔER+KΔEL)ω=rB(KΔER−KΔEL)/ΔT→Δθ=ωΔT=rB(KΔER−KΔEL)
The kinematic parameters for the differential drive robot configuration are:
- Ticks per revolution: 48 * 64 = 3072
- Update interval/rate: 50 ms (20 Hz)
- Wheel radius: r
- Distance between the wheels, "base": B
You need to find values for r and B for the DD2419 robot platform.
Your task is to a implement simple odometry from wheel encoders from a rosbag of a TurtleBot and to to show your estimated trajectory and compare to the expected trajectory. Please look at the report template Links to an external site. to see what you are expected to be able to answer / comment.
Run the odometry node (which you will edit when doing the task):
ros2 run assignment_2 odometry
Run the rosbag using the course supplied launch file:
ros2 launch assignment_2 odometry.launch.py bag_dir:=<DIR>/real_robot
If you use the external SSD that we provide it would be:
ros2 launch assignment_2 odometry.launch.py bag_dir:=/home/wasp/Downloads/real_robot
As you can notice the robot does not move in RViz. All the data is presented at the origin. Your task is to implement functioning odometry, using encoder data recorded from the robot. You will do this by filling in the missing code in the file wasp_autonomous_systems/assignment_2/assignment_2/odometry.py
. We have supplied you with a code skeleton and marked a couple of places with the comment # TODO: Fill in
, you are allowed to change the file however you like though.
After you have filled in your code you can again run the odometry node and the launch file:
ros2 run assignment_2 odometry
ros2 launch assignment_2 odometry.launch.py bag_dir:=<DIR>/real_robot
If you use the external SSD that we provide it would be:
ros2 launch assignment_2 odometry.launch.py bag_dir:=/home/wasp/Downloads/real_robot
NOTE: It is a good idea to start the odometry node before the launch file, which plays the rosbag, as otherwise you may miss some of the data.
Video Demonstration
Below is a video demonstrating what you should be seeing.
Task 2.2
Detect collision with the accelerometer and trigger a change of direction with TurtleBot in the simulator. Show the trajectory of the robot in a room.
Run the supplied collision detection launch file:
ros2 launch assignment_2 collision_detection.launch.py
You will see the TurtleBot in a new, simplified, environment. The TurtleBot is on it's way to becoming autonomous! However, as you will see, it needs your help. If you run the autonomous controller:
ros2 run wasp_autonomous_systems autonomous_controller --ros-args -p use_sim_time:=true
then you will see that the TurtleBot starts moving on its own! Though it can only move in a straight line and will therefore crash. Luckily, there is a mechanics that has been implemented that will make the TurtleBot reverse and turn around when a collision has been detected. The collision detection is not finished though. It is therefore your job to finish it by editing the file wasp_autonomous_systems/assignment_2/assignment_2/collision_detection.py
. We have supplied you with a code skeleton and marked a couple of places with the comment # TODO: Fill in
, you are allowed to change the file however you like though.
Hopefully you noticed this when we ran the autonomous controller: --ros-args -p use_sim_time:=true
. This enables us to speed up or slow down the simulation without altering how the robot behaves. Go ahead and speed up the simulation in the Webots GUI. Hopefully you will see the robot moving faster but behave the same as it did when it was running at real-time. What happens when you set 'use_sim_time' to false for the autonomous controller node (i.e., --ros-args -p use_sim_time:=false
)? Why do you think this happens? You can look at the autonomous controller code at wasp_autonomous_systems/wasp_autonomous_systems/wasp_autonomous_systems/autonomous_controller.py
. Why do you think it is not as necessary to set this argument for your collision detection node?
After you have filled in your code you can again run the launch file:
ros2 launch assignment_2 collision_detection.launch.py
together with your collision detection:
ros2 run assignment_2 collision_detection
and the autonomous controller
ros2 run wasp_autonomous_systems autonomous_controller
EXTRA: Look at the file wasp_autonomous_systems/wasp_autonomous_systems/wasp_autonomous_systems/autonomous_controller.py
and see if you can improve it. Maybe you can use other sensor data to prevent the collision from taking place in the first place.
HINT: Use rqt to figure out what happens with the imu data when a collision occur.
Video Demonstration
Below is a video demonstrating what you should be seeing.
Task 2.3
In this task we will look at using a learning-based method for perception. To be more specific we will use version of YOLO (you only look once) which originally presented in this CVPR 2016 paper Links to an external site.. "Looking once" refers to performing the object detection in a single pass where object candidates and class probabilities are regressed in one step from images. This is in contrast to two-stage methods (e.g. R-CNN) where in a first step object proposals are found (might be an object in this region irrespective of class) and then in a second stage classify the objects in these regions.
We will experiment with YOLOv8 which is a recent method for object detection and image segmention. That is it segments the image to tell exactly which pixles belongs to what object. We experiment with the same model in two contexts.
- The KITTI sequence from the previous assignment and on a video stream from your own camera.
- The output from YOLOv8 are object masks and class probabilities.
NOTE: When you run the YOLOv8 model with the program kitti_segmentation
it will download the model if it is not found in the current directory. This means that if you later run the code in another directory it will have to download the model again.
KITTI
First run YOLOv8 on the KITTI seguence. Keep an eye on the report template which contains the questions to answer.
Terminal 1:
rviz2 -d ~/ros2_ws/install/assignment_2/share/assignment_2/rviz/kitti_segmentation.rviz --ros-args -p use_sim_time:=true
Terminal 2:
ros2 run assignment_2 kitti_segmentation --ros-args -p use_sim_time:=true -p "device:='cpu'"
You can change "device:='cpu'"
to "device:='0'"
or "device:='1'"
, ..., if you have a supported (NVidia) GPU that you can run on. The number correspond to the GPU to run on, so if you only have a single GPU then it would be 0
. First time you run this it will download the YOLOv8n-seg model, make sure that it has finished downloading before proceeding.
Terminal 3:
ros2 launch wasp_autonomous_systems kitti_car.launch.py
Terminal 4:
ros2 bag play --read-ahead-queue-size 100 -l -r 1.0 --clock 100 <DIR>/kitti
If you use the external SSD that we provide it would be:
ros2 bag play --read-ahead-queue-size 100 -l -r 1.0 --clock 100 ~/Downloads/kitti
Below is a video demonstrating what you should be seeing.
Your own USB camera
In this second experiment we will use the stream from your own camera. We assume that you have access to a usb camera here (like the one built into almost all laptop or the one you use for your Zoom calls if that is different). You do not need it for much time so borrow one if you do not have one.
First we need to connect to the camera and make sure that we stream the data on the right format.
In terminal 1 run the following and look at the output which will identify which pixel formats you camera supports.
The example below comes from a DELL XPS 15 laptop.
[INFO] [1694371116.778560205] [usb_cam]: Starting 'default_cam' (/dev/video0) at 640x480 via mmap (yuyv) at 30 FPS
[INFO] [1694371116.906163866] [usb_cam]: This devices supproted formats:
[INFO] [1694371116.906502453] [usb_cam]: Motion-JPEG: 1280 x 720 (30 Hz)
[INFO] [1694371116.906553468] [usb_cam]: Motion-JPEG: 960 x 540 (30 Hz)
[INFO] [1694371116.906588362] [usb_cam]: Motion-JPEG: 848 x 480 (30 Hz)
[INFO] [1694371116.906622112] [usb_cam]: Motion-JPEG: 640 x 480 (30 Hz)
[INFO] [1694371116.906655094] [usb_cam]: Motion-JPEG: 640 x 360 (30 Hz)
[INFO] [1694371116.906685552] [usb_cam]: YUYV 4:2:2: 640 x 480 (30 Hz)
[INFO] [1694371116.906716407] [usb_cam]: YUYV 4:2:2: 640 x 360 (30 Hz)
[INFO] [1694371116.906751438] [usb_cam]: YUYV 4:2:2: 424 x 240 (30 Hz)
[INFO] [1694371116.906784599] [usb_cam]: YUYV 4:2:2: 320 x 240 (30 Hz)
[INFO] [1694371116.906818547] [usb_cam]: YUYV 4:2:2: 320 x 180 (30 Hz)
[INFO] [1694371116.906851220] [usb_cam]: YUYV 4:2:2: 160 x 120 (30 Hz)
[INFO] [1694371116.906884296] [usb_cam]: Setting 'brightness' to 50
unknown control 'white_balance_temperature_auto'
Now you want to map this output to the the possible format
yuyv2rgb
: V4L2 capture format of YUYV, ROS image encoding of RGB8uyvy2rgb
: V4L2 capture format of UYVY, ROS image encoding of RGB8mjpeg2rgb
: V4L2 capture format of MJPEG, ROS image encoding of RGB8rgb8
: V4L2 capture format and ROS image encoding format of RGB8yuyv
: V4L2 capture format and ROS image encoding format of YUYVuyvy
: V4L2 capture format and ROS image encoding format of UYVYm4202rgb8
: V4L2 capture format of M420 (aka YUV420), ROS image encoding of RGB8mono8
: V4L2 capture format and ROS image encoding format of MONO8mono16
: V4L2 capture format and ROS image encoding format of MONO16y102mono8
: V4L2 capture format of Y10 (aka MONO10), ROS image encoding of MONO8
Look in your the output from your camera and pick one of the formats from the top of the list but maybe nit with the highest resolution and frequency to reduce computational load. In the example output above Motion-JPEG is at the top and this maps to mjpeg2rgb
in the format lists just above here.
In the example below you would instead pick format uyvy
(note lower case!)
Terminal 1, start your camera assuming that you have the camera that we tested last above ie format uyvy and we want a bit lower resolution 1280x720:
ros2 run usb_cam usb_cam_node_exe --ros-args -p framerate:=30.0 -p image_height:=720 -p image_width:=1280 -p pixel_format:=uyvy
Terminal 2, check if you can see an image from your camera:
ros2 run rqt_image_view rqt_image_view /image_raw
Terminal 3, run YOLOv8 segmentation:
ros2 run assignment_2 camera_segmentation --ros-args -p "device:='cpu'"
You can change "device:='cpu'"
to "device:='0'"
or "device:='1'"
, ..., if you have a supported (NVidia) GPU that you can run on. The number corresponds to the GPU to run on, so if you only have a single GPU then it would be 0
.
Terminal 4, show segmentation:
ros2 run rqt_image_view rqt_image_view /segmentation/image_raw
If you have not done so already make sure to check what you are supposed to do with all of this :-) by looking at the Assignment 2: Tasks 2.1-3.
I you want to see how we an combine the information from the semantic segmentation from the camera to mark which points from the LiDAR data comes from dynamic objects, take a look at the conditionally elective task Description Elective task A.
Below is a video demonstrating what you should be seeing.
Task 2.4 (Sensor Fusion GPS aided Inertial Navigation System)
The mandatory task 2.4 is described in Part 1 of this document Download Part 1 of this document.
It is easiest solved in Matlab, where these files Download these files are available for download, however you can also use Octave, Python, Julia, or any other tool of your choice, but you will then need to convert the provided code yourself.
The problem describes how to use sensor fusion by a Kalman filter to do positioning by combining sensor information from a GPS and an IMU (accelerometer and gyro). You will use prerecorded real world data and study the performance in a situation with GPS outage. You will get some experience of tuning a sensor fusion filter in a real situation. You will also study the performance improvement achievable by using an improved motion model and an additional speedometer sensor.
Optional material: To really understand the dynamical model and transformations used in the assignment you might want to have a look at
- How quaternions or rotational matrices are used to describe rotations.
- Another nice video with further explanation.
- Or alternatively this open-gl tutorial about rotations.
- An experiment showing why you might prefer quaternions. (video 2min)
but you will be able to do the exercise without looking at this.
Extra task (if you miss the deadline)
Implement a Kalman filter for Task 2.1. Run the prediction and can see how the uncertainty increases as the robot moves. It is sufficient to plot the standard deviation in x,
y,
θ in a graph as a function of time. How does the uncertainty change along the path?
Submission: Include your results in the report by extending the template appropriately.
Conditionally elective tasks
- Elective task A Dynamic Object Detection
- Described here Description Elective task A
- Elective task B (GPS aided INS - with added motion constraint model)
- Described here Description Elective task B