Description Elective task A: Dynamic Object Detection, camera and LiDAR fusion

You submit your solution : here

In this assignment you will use the semantic segmentation output from YOLO, that you looked at in Description Assignment 2: Perception, to create a map of the static environment by detecting what is dynamic and removing that.

Running the code

To run the assignment code you do the following:

In the first terminal

rviz2 -d ~/ros2_ws/src/wasp_autonomous_systems/assignment_2/rviz/kitti_segmentation_extra.rviz --ros-args -p use_sim_time:=true

In the second terminal

ros2 run assignment_2 kitti_segmentation_extra --ros-args -p use_sim_time:=true -p downsample_voxel_size:=0.2 -p "device:='cpu'" -p num_clouds_accum:=1

Remember, you can change "device:='cpu'" to "device:='0'" or "device:='1'", ..., if you have a supported (NVidia) GPU that you can run on. The number correspond to the GPU to run on, so if you only have a single GPU then it would be 0.

In the third terminal

ros2 bag play --read-ahead-queue-size 1000 -l -r 1.0 --clock 100 <DIR>/kitti

If you use the external SSD that we provide it would be:

ros2 bag play --read-ahead-queue-size 1000 -l -r 1.0 --clock 100 ~/Downloads/kitti

In the fourth terminal

ros2 launch wasp_autonomous_systems kitti_car.launch.py

Task pipeline

You are given a pipeline with three functions to complete. The figure below gives a rough overview of it with the functionality you will implement highlighted.

CondElectiveA-diagram.png

The output when running the code for this task will look similar to what you saw when you looked at the KITTI data and Yolo in Description Assignment 2: Perception. As you add functionality you will see how the output changes. There will be two maps, represented by points clouds, in the output. One map representing the static points and one representing the dynamic points.

You will implement the following functionality

  1. Since we are working on fusing the camera and the LiDAR and we will only use one camera we will only consider the part of the LiDAR data that is in the field of view of the camera. This is done in function crop_cloud in the python script kitti_segmentation_extra.py.
  2. In the second step you will project the 3D points from the LiDAR point cloud to the 2D image plane of the camera. This is done to color the point cloud as well having an association between the 3D points and, in part 3, the output from the semantic segmentation of YOLO. You do this in function coord_to_pixel.
  3. In the third step you will take the output from YOLO and decide which pixels in the image are dynamic based on the output from YOLO. Concretely you need to decide which classes are dynamic and what level of confidence you require to say that it is actually an object. You encode this by assigning the corresponding pixel in a mask with a color. you can use this color to separate classes, instances, etc depending on what you want to see as output. The important thing is that the color for a dynamic pixel should not be (0,0,0), as this will be seen as static later on in the pipeline. You implement this in function color_dynamic where there is an example to show you the syntax.

In the commands used above you work with a single LiDAR point cloud frames at a time. You can specify how many frames are accumulated in the maps. This will impact computational cost but will highlight the (poor?) map quality. You can change this at runtime with rqt reconfigure as you do in Description Assignment 3: Planning.

 

We want you to create a video where you switch between the static and the dynamic maps as the you play the rosbag. The video should show what worked and also what limitations the method as it is now has.

Questions

You should discuss these questions in the report but you do not need to implement this unless you want to dig deeper.

  1. How did you build the mask in step 3, ie what classes, confidence levels, etc and why?
  2. Looking at the results, how would you describe the static and dynamic map qualities?
  3. Looking at the results, what are the main errors sources?
  4. How can you improve the dynamic classification in step 3? 
  5. How can you improve the static and dynamic map respectively?