ROS 2 Latencies for High Frequencies using the EventExecutor

As a small side experiment of my PhD thesis, I investigated latencies of ROS message transfers.
I know that I have not been the first person to investigate this and that there are also frameworks for running tests now. I will list here some links if you are interested in what others have already investigated.

Exploring the performance of ROS2
Latency Analysis of ROS2 Multi-Node Systems

Nevertheless, I performed my own experiments because of two reasons. First, I wanted to test it on our specific hardware (the humanoid robot Wolfgang-OP). Second, the previous works did not use the EventExecutor and message frequencies. For reasons on why we need to use the EventExecutor, see our previous post here.

Experimental Setup and Influence of the Idle Function

If you want more details on the robot that I used, you can find it here.

In these experiments, the main computer of the robot was already upgraded to an ASUS PN51-E1 with Ubuntu 22.04. The latencies were just measured for a single transfer between a sender and a receiver (no ping-pong or multi node setups). We each recorded messages for 100s after discarding
the first 100 messages, which often had high outliers as nodes were still starting. We were interested in the latencies for messages that control the robot’s joints (since quick reactions to disturbances are crucial in a bipedal robot), and therefore, we used a small joint command message for our tests. Please keep in mind that our results might not apply for larger messages like images or pointclouds.

Naturally, it is important how the nodes of the sender and the receiver node are executed. ROS 2 allows to run them in two separate processes or in a single process (which we tested both). It is also important how this process(es) are then executed by the operating system. We forced Ubuntu to run these on isolated cores by using “taskset” and “isolcpus” (see our previous post for more infos about this). Additionally, we realized that it is important to deactivate the idle function of the CPU as it has a high influence on the latency (see image below). You can deactivate it in Ubuntu with the following command:

echo 1 | sudo tee /sys/devices/system/cpu/cpu∗/cpuidle/state∗/disable > /dev/null

The orange “Idle disabled” boxes are a little hard to distinguish, as they are often small. Every second line represents the “Idle disabled” state.

Executor

As stated above, we need to use the EventExecutor for performance reasons in our system. Still, I was interested how it performs in relation to the other standard executors. For this small sender-receiver setup, the CPU usage of the executor is not important. Therefore, we could test all of them without further issues. We used a frequency of 1000Hz in this experiment. The difference between the executors is small. All of them profit from running the nodes in the same process.

Comparison to ROS 1

While migrating to ROS 2, our team had the impression that there is a large difference between using nodes in Python and using nodes in C++, which was not so large in ROS 1. Therefore, I also compared the performance of the ROS 2 C++ EventExecutor to the ROS 2 Python Executor (no event based implementation exists at the moment for Python), as well as to their counterparts in ROS 1. Naturally, the ROS 1 nodes can not be executed in the same process, therefore, this test case was omitted. The QoS settings for ROS 2 were set to default, as these mimic the ROS 1 behavior best.
The results (see image below) show that the performance of ROS 2 Python nodes is indeed very bad. Using C++, ROS 2 can be faster than ROS 1 but only if the nodes are executed in the same process.

DDS

ROS 2 also allows the usage of different DDS implementations. Naturally, I also tested if this choice might influence the latency. I only tested FastDDS and CyclonDDS, since these are open source.
In our testcase, Cyclone DDS performed better than FastDDS (although the impact of this choice is smaller than, e.g., using the idle function). Previous work (https://ieeexplore.ieee.org/abstract/document/9591166/) indicated that FastDDS has a lower latency. But they only tested frequencies up to 100 Hz.

Integrated Test and Scheduling

After performing these simple sender-receiver experiments, I tested the latencies between two nodes of our motion software stack that were running at 500Hz while the robot was actively walking (and therefore many other nodes were running). Interestingly, the latencies were much higher than in the toy example. After some investigation, it turned out that the Linux scheduler was the problem. In the experiment before, we only had two threads which each had their own core, so there weren’t any issues. Now, in the integrated case, we have many threads running simultaneously. Unfortunately, the used “taskset” command leads to the scheduler to only execute one thread at a time. I solved this issue by setting the Linux scheduler behavior to the round-robin mode.
Interestingly, deactivating the idle function of the CPU still resulted in improved latencies, although the CPU was under high load.

I hope that these insights might help others in reducing the latency of ROS 2 message transfers for high frequency topics.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmail

Experiences with ROS 2 on our robots

This blog post presents some issues which came up when we migrated from ROS 1 and started using ROS 2. We assume that other people might run into them too. Therefore, we will describe the problems and possible solutions below. Our use case of ROS 2 is a bit different than most systems that currently use ROS 2. We use a team of humanoid robots in RoboCup Soccer. Therefore we have on the one hand differences from using a humanoid robot (500+ Hz control loop cycle) and on the other hand from the league (no network connection to the robot after starting the code).

Scheduling

We have many nodes (~45) running concurrently when we start our full software stack. Most of them have additionally multiple threads. At the same time, some of them, e.g. the walking, need to be executed with a constant and high (500+ Hz) rate. This leads to some issues with the default Linux scheduler. Setting the nice value of the processes was not enough to solve this because it does not necessarily reduce latencies. In ROS 1 it was enough for us to assign the processes to specific CPUs using the “taskset” command, but with ROS 2 we still had issues with this. We finally resolved it by also using the “isolcpus” kernel parameter which forbids the scheduler of using the specified CPU core. Therefore, the process can run freely on its own CPU without any interruptions. Using a real time kernel would probably be a better, but also more complicated solution that we will investigate in the future. More information can be found in the following ROSCON talk: https://roscon.ros.org/2019/talks/roscon2019_concurrency.pdf

Executor Performance

One of our largest issues was the extreme performance drop between ROS 1 and ROS 2. Simple nodes that only took a few percent of a core before, now needed a complete core for themselves. Basically, almost every node was running at 100% CPU usage. It took us some time to realize that the issue comes from the fact that we have a lot of messages per second (e.g. /joint_states is sent with 500 Hz) and thus the badly implemented standard executor was totally overloaded. Luckily, iRobot already did a lot of work on this and created an “Events Executor” for rclcpp (https://github.com/ros2/design/pull/305). Unfortunately, this executor is not yet in the master. Furthermore, there is no such thing for rclpy yet. By using it, we reached node performance values similar to ROS 1 with our C++ nodes. But we also needed to rewrite some nodes from Python to C++ as it was otherwise not possible to run our complete software stack on our 8 core CPU. Unfortunately, all standard ROS 2 nodes, e.g. robot state publisher, use the default implementation and therefore needed to be manually patched by us. This also includes the tf listener. The repository containing the stand-alone Events Executor can be found here: https://github.com/irobot-ros/events-executor/ Our patched versions of rclcpp and other packages are linked here: https://github.com/ros2/design/pull/305#issuecomment-1133757777

Callbacks and Timers

When you are in a callback or timer thread (so anything that is handled by the spinning), it is not possible to receive other callbacks per default. This means for example that you can not wait to get a tf transformation, as you will never receive anything while waiting for it. The same is the case for simulation time callbacks in rclpy. This leads to the time not progressing inside callbacks which leads to other issues. The issue is solved in rclcpp by spinning a callback group containing the time callback in a separate thread, but the isolated execution of specific callback groups is not supported in the current implementation of the Python executor (see https://github.com/ros2/rclpy/issues/850). Although there are multi-threaded Executors available, they seem to not solve the issue completely. If there are many callbacks to handle, they might not manage to handle the correct one in time before you run into a timeout. Interestingly, the order in which the subscriptions are created influences this behavior and sometimes issues can be resolved by ordering them differently.

FastDDS

Sometimes FastDDS fails to list nodes / topics after restarting a node while other nodes are running. The FastDDS discovery server (see https://fast-dds.docs.eprosima.com/en/latest/fastdds/ros2/discovery_server/ros2_discovery_server.html) is similar to the concept of a rosmaster in ROS 1 and should fix this issue.

There are issues with callbacks not arriving in C++ on ROS 2 rolling under Ubuntu 22.04 and FastDDS, which was reintroduced as the default DDS for rolling. We observed these issues our self in our own code base, but they are also the reason why nav2 is not released for rolling. See https://github.com/ros-planning/navigation2/issues/2648 for information regarding the release of nav2 for rolling and humble as well as https://discourse.ros.org/t/nav2-issues-with-humble-binaries-due-to-fast-dds-rmw-regression/26128 and https://discourse.ros.org/t/fastdds-without-discovery-server/26117/14 for a more general discussion. Switching to CycloneDDS solved these issues for us, but we still need to build nav2 ourselves.

CycloneDDS Configuration

Cyclone allows a lot of further settings. Unfortunately, this is just documented in the Cyclone documentation. One important setting when working with large messages, e.g. images, is the kernel queue size (see https://github.com/ros2/rmw_cyclonedds). For Cyclone configuration related things one needs to go to https://cyclonedds.io/docs/cyclonedds/latest.

To activate CycloneDDS follow the steps on https://github.com/ros2/rmw_cyclonedds.

The biggest problem that we were facing with Cyclone is that it can only operate on a fixed network configuration which cannot be changed at runtime. The default network adapter that is used it determined by a ranking. Wired adapters have the highest priority, loopbacks the lowest. Adapters can also be specified manually in the cyclone config file. Issues arise if the selected adapter is not available, e.g. because the ethernet cable is unplugged from the robot after starting the software. In our case, that is quite often the case because an ethernet cable for debugging is removed or the wifi disconnects. We solved this issue by creating a bridge network containing the wired adapter. This results in the adapter being available even when the cable is unplugged. Cyclone is then configured to use this bridge network adapter.

There is a fix already proposed for cyclonedds which enables optional adapters, but it is not merged at the time of writing (see https://github.com/eclipse-cyclonedds/cyclonedds/pull/1336). It will also only partially fix the problem, as cyclone will continue to try to use the adapter that was connected when it was first started, and not dynamically switch to another one.

As our robot uses the builtin ethernet adapter internally for the camera, external devices such as laptops are connected via a usb2lan adapter. This introduced the problem that its network interface name might change due to different brands etc. We mitigate this by only using adapters from a single brand and adding a udev rule that always assigns the same name to it. It is then included by this name in the bridge interface described above.

ROS Domain ID

While the default in ROS 1 was that you use your local roscore, in ROS 2 the default is to communicate with anybody in the network. This can quite quickly lead to issues if you have multiple robots or workstations running in the same network. Therefore it is crucial to set the ROS_DOMAIN_ID environment variable differently for every machine, as isolation is done as opt-in instead of opt-out like in ROS 1. The number of domain ids is also quite limited for large setups and a ros domain with many nodes might “leak” into the namespace of another domain id.

Colcon

One of the most unnecessary issues which could have been all avoided by keeping catkin as a build tool are the issues with colcon. Just to get the same quality of usage that you had with catkin tools by default you need to invest a lot of effort. First, there are many additional packages that you need to install (colcon-common-extensions, colcon-edit, colcon-clean). Then you need to set the verbosity level of colcon, as you will otherwise get completely spammed with unnecessary printouts (export COLCON_LOG_LEVEL=30). Colors are still not supported although there are PRs for it (https://github.com/colcon/colcon-core/pull/487). You might want to create a lot of aliases because the colcon commands are very verbose compared to catkin (e.g. colcon build --packages-up-to PACKAGE --symlink-install to build a specific package including its present dependencies).

Rebuilds

Due to changes in the build process, notably the removal of catkin’s devel folder, a default build now installs to the install folder, and the install folder has to be sourced. This makes it necessary to rebuild your package for every change you do, even in configuration files, launch files, scripts, or python code. Colcon has an option --symlink-install, but it often cannot be relied on and it does not include things like launch files or config files. So get used to rebuilding for every parameter you change in a config file.

Rate in Simulation

When using rate.sleep() in simulation, the node does not always sleep correctly. Sometimes it does not sleep at all, which leads to executing the code again in the same time step. https://github.com/ros2/rclcpp/issues/465

Sim Time

The usage of lower than real-time simulations (such as https://humanoid.robocup.org/hl-vs2022/) can be tricky due to packages using the walltime instead of the ros time for some timeouts etc.. The use_sim_time parameter which tells a node to use the time from the /clock topic instead of the wall time is now present at each node separately. This is the case because the concept of global parameters does not exist in ROS 2. It is therefore necessary to pass a launch file argument, e.g. sim:=true, through the full launch file hierarchy to set the use_sim_time parameter at the launch of each node individually.

Parameters

ROS 2 handles parameters quite differently compared to ROS 1. Parameters exist only in the scope of a single node. They need to be declared in the code to be available. During this declaration it is also possible to define the type, value ranges etc. The declaration process is quite verbose if you have a large number of parameters. You can use the following trick to skip the declaration if you don’t care about a neatly generated rqt reconfigure gui:

For Python:

node = Node("example_node", automatically_declare_parameters_from_overrides=True)

# or if you inhert from the Node class

class MyNode(Node):
    def __init__(self):
        super().__init__('example_node', automatically_declare_parameters_from_overrides=True)

For C++:

MyNode::myNode(const std::string &ns, std::vector<rclcpp::Parameter> parameters) :
    Node(ns + "my_node", rclcpp::NodeOptions().allow_undeclared_parameters(true).parameter_overrides(parameters).automatically_declare_parameters_from_overrides(true)) {

If you have any global parameter values, a blackboard node which holds these parameters is needed. It could look like this:

<node name="parameter_blackboard" pkg="demo_nodes_cpp" exec="parameter_blackboard" args="--ros-args --log-level WARN">
	<param name="use_sim_time" value="$(var sim)"/>
	<param from="$(find-pkg-share my_config_package)/config/global_parameters.yaml" />
</node>

Retrieving the global parameters from that blackboard is not as straightforward as querying the parameters of your local node. You need to perform a manual service call. Note that depending on the executor setup service calls might timeout or block indefinitely when done inside of callbacks.

Here is some demo code to retrieve the params of another node like the mentioned parameters blackboard.

def get_parameters_from_other_node(own_node: Node,
                                   other_node_name: str,
                                   parameter_names: List[str],
                                   service_timeout_sec: float = 20.0) -> Dict:
    """
    Used to receive parameters from other running nodes.
    Returns a dict with requested parameter name as dict key and parameter value as dict value.
    """
    client = own_node.create_client(GetParameters, f'{other_node_name}/get_parameters')
    ready = client.wait_for_service(timeout_sec=service_timeout_sec)
    if not ready:
        raise RuntimeError(f'Wait for {other_node_name} parameter service timed out')
    request = GetParameters.Request()
    request.names = parameter_names
    future = client.call_async(request)
    rclpy.spin_until_future_complete(own_node, future)
    response = future.result()

    results = {}  # Received parameter
    for i, param in enumerate(parameter_names):
        results[param] = parameter_value_to_python(response.values[i])
    return results

TF Node Spam

By default in C++ a tf listener creates its own node just to listen to tf updates in parallel. This is not needed anymore, as rclcpp supports spinning of specific callback groups using their own executor. To lower the overhead and reduce spam in e.g. the rqt node view we therefore suggest that you should replace the following instantiation of the tf listener

tf2_ros::Buffer tfBuffer{node->get_clock()};
tf2_ros::TransformListener tfListener{tfBuffer};

with this one

tf2_ros::Buffer tfBuffer{node->get_clock()};
tf2_ros::TransformListener tfListener{tfBuffer, node};

The spinning of the callback group is done with the default executor. You would need to patch tf2_ros your self to use the more performant EventsExecutor. This is especially relevant as /tf callbacks might arrive at high frequency.

All of this does not apply to Python, as neither the separate callback groups nor the EventsExecutor are implemented for it…

Unreleased Packages

One thing that is not a major deal breaker but still annoying are missing releases for some packages. There are always some packages, e.g. rqt_tf_tree, that are not released as apt packages (for rolling) although their code works fine. It just seems like there is a lack of maintainers for ROS 2. We hope that this changes when support for ROS 1 runs out and people can concentrate on ROS 2.

Environment Setup

There are some things that you can set in your ~/.bashrc or ~/.zshrc:

# reduce colcon spam
export COLCON_LOG_LEVEL=30
# make logs colorful
export RCUTILS_COLORIZED_OUTPUT=1
# format logs in terminal
export RCUTILS_CONSOLE_OUTPUT_FORMAT="[{severity}] [{name}]: {message} ({function_name}() at {file_name}:{line_number})"
# force using CycloneDDS as FastDDS is currently broken
export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
# source ros
source /opt/ros/rolling/setup.zsh
# source workspace
source $HOME/colcon_ws/install/setup.zsh
# enable autocompletion
eval "$(register-python-argcomplete3 ros2)"
eval "$(register-python-argcomplete3 colcon)"
# reduce depraction warning spam in colcon
export PYTHONWARNINGS='ignore:::setuptools.command.install,ignore:::setuptools.command.easy_install,ignore:::pkg_resources'
# always set a domain ID (should be unique in your network)
export ROS_DOMAIN_ID=42
FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmail

Results of Humanoid League Survey

This year we were, unfortunately, eliminated from the competition quite early. So we used our time to do a small survey with all teams that participated this year in the Humanoid League Kid-Size (11 teams). In our opinion, there is still a lack of knowledge what other teams are doing. Thus, teams are trying to come up with solutions for problems that other teams have already solved. We hope that this small survey will help to mitigate this issue. Naturally, we only give a very abbreviated version here with almost all details missing. So please just talk to the corresponding teams to know more.

All teams in the league were happy to share their knowledge with us. We thank them for their openness, since otherwise this survey would not have been possible.

We also hope to continue this survey in the next years, which would also allow tracking the development of the league over time. Maybe it would also be possible to require filling out such a survey during registration for the next RoboCup.

The following abbreviations will be used:

BH: Bold Hearts

CIT: CIT Brains

HBB: Hamburg Bit-Bots

ICH: Ichiro

ITA: ITAndroids

NU: Nubots

RHO: Rhoban Footbal Club

RB: ROBIT

RF: RoboFei

KMU: Team-KMUTT

UA: UTRA Robosoccer

Participation in the Humanoid Virtual Season (HLVS)

We asked if teams will be participating in the next HLVS. The majority (CIT, HBB, ITA, NU, UA) have already participated and will participate again. BH and KMU are thinking to maybe start participating. ICH and RB said that they are already doing too many other competitions. Similarly, RHO and RF said that they have no time.

For any teams that want to create a Webots model of their robot, we would like to point you to these tools, which might help you:

Creating a URDF from a Onshape CAD assembly https://github.com/Rhoban/onshape-to-robot

Creating a Webots proto model from URDF: https://github.com/cyberbotics/urdf2webots

Open Source

The number of teams making their code and hardware (partially) open source is very high (see Figure). A list is given below. We also added direct links to specific parts in the following section if we knew them, but if there is no link provided, there might still be code available by the team, so have a look.


BH

Partially

https://gitlab.com/boldhearts/

CIT

Partically

https://github.com/citbrains/

HBB

Open

https://github.com/bit-bots/

https://cad.onshape.com/documents/8c6aa9a8917f764cb7039c2d/w/af71e5083243affec9ac82a8/e/e42d9814ef6f704f62b6758c

ICH

Partially

https://github.com/ichiro-its

ITA

Partially

https://gitlab.com/itandroids/open-projects

NU

Open

https://github.com/NUbots

RHO

Open

https://github.com/Rhoban/

https://cad.onshape.com/documents/41654e89e61a392d020b728c/w/d555ceec170d351622b789de/e/4c9a04a707c36ac7ad2ca0f8

RB

Closed

RF

Open

https://github.com/RoboFEI/RoboFEI-HT_2022_SOFTWARE

KMU

Closed

UA

Open

https://github.com/utra-robosoccer

Software

Computer Vision

The rise of CNNs and deep learning has also affected the Humanoid League. Almost all teams use either some kind of YOLO or similar CNNs. The only exception is NU which use their own visual mesh approach.

HBB can also be noted as they not only use bounding box output but also segmentations for the field area and the lines. These are then later used as features for the localization.

BH

Darknet as framework with small custom network

CIT

YOLOv4 tiny

HBB

YOEO (custom network combining segmentation for field+lines and bounding box for ball+goalposts+intersections) https://www.researchgate.net/publication/356873226_YOEO_-_You_Only_Encode_Once_A_CNN_for_Embedded_Object_Detection_and_Semantic_Segmentation

ICH

YOLOv4 tiny

ITA

YOLOv4

NU

Visual Mesh https://github.com/Fastcode/VisualMesh

RHO

YOLOv5

RB

YOLOv4-tiny

RF

MobileNet

KMU

Custom CNN

UA

Pretrained YOLOv4

Distance Estimation

Just detecting the different features on the camera image is not enough. We also need their position relative to the robot. One way to solve this issue is by circumventing it through working completely in image space (BH, RF). It is also possible to use the size of the ball on the image, as its real size is known. This is used by NU and UA.

The most common approach is (not surprisingly) inverse perspective mapping (IPM). Here the height of the camera (given through forward kinematics) and the angle (forward kinematics + IMU) are used together with an intrinsic camera calibration to project a point in image space to Cartesian space. A well tested and maintained package for ROS 2 can be found here: https://github.com/ros-sports/ipm

This year, no team used stereo vision to solve this problem.

BH

Working in image space

CIT

IPM

HBB

IPM

ICH

Simplified IPM (no camera calibration, fixed height)

ITA

IPM

NU

IPM + ball size

RHO

IPM

RB

IPM

RF

Working in image space

KMU

IPM

UA

IPM + ball size

Localization

Since localization is not necessarily needed to score a goal, it is neglected by many teams. There are four teams (BH, NU, RF, KMU) that don’t have a finished localization. ICH uses a simple approach based on the distance to the goals.

The most commonly used approach is the particle filter (CIT, HBB, ITA, RHO, UA) based on different features.

BH

No localization

CIT

Particle filter (lines, intersections, goal posts)

HBB

Particle filter (line points)

ICH

Distance between goals

ITA

Particle filter (field boundary, goal posts)

NU

Working on particle filter (lines, goal posts)

RHO

Particle filter (line intersections, goal posts)

RB

Monte Carlo (intersections)

RF

No localization

KMU

No localization

UA

Particle filter (AMCL)

Walking

One of the hardest parts for many teams is getting their robot to walk stably. Bipedal walking is still a very active research topic, and therefore it is not surprising that there are various approaches used in the league.

Many teams use some a pattern generator. Here quintic splines are mostly used. This is due to the fact that RHO wrote an open source walk controller called IKWalk that is now also used by BH. It was further improved by HBB who added automatic parameter optimization. This version is also used by NU.

Another approach is to use the Zero Moment Point (ZMP) (CIT, ITA).

One interesting idea that we got from ITA is the following: when using a PD controller based on IMU data to stabilize the robot it makes sense to use the direct values from the gyro (angular velocity) as the D value instead of a computed derivative (which is more noisy).

BH

Rhoban IKWalk

CIT

ZMP based pattern generator

HBB

Quintic splines, IMU PD control, phase modulation, parameter optimization https://2022.robocup.org/download/Paper/paper_19.pdf

ICH

Darwin-OP walk controller with own additions

ITA

ZMP, gravity compensation, PD control

NU

Hamburg Bit-Bots walk

RHO

Cubic splines

, foot pressure stabilization

RB

Pattern generator, IMU+foot pressure stabilization

RF

Sinoid

KMU

Pattern generator

UA

Hardcoded trajectories

Getting Up

All teams except HBB use manual keyframe animations for their standing up motion. Many of them told us that they are annoyed by this as it requires a lot of retuning.

HBB is the only team that follows a completely different approach, which is very similar to their walking approach. It uses Cartesian quintic splines to generate the motion and PID controller to stabilize it. The parameters are optimized automatically and it can be transferred to other robots. If you are interested, you can have a look here:

https://www.researchgate.net/publication/356870627_Fast_and_Reliable_Stand-Up_Motions_for_Humanoid_Robots_Using_Spline_Interpolation_and_Parameter_Optimization

Kick

Similarly to the stand up motion, the kick is also solved by almost all teams using a keyframe animation.

There are two exceptions. First, HBB uses again an approach that similarly to their walking and getting up. It also allows defining the kick direction. We don’t have a paper about it (yet) but you can see a general overview here https://robocup.informatik.uni-hamburg.de/wp-content/uploads/2021/10/Praktikumsbericht.pdf

Another approach is used by ITA that uses also splines for the kick movement but stabilizes it using the ZMP.

Decision Making

Many teams still rely on simple approaches as state machines (BH, ICH, KMU, UA) or simple if-else combination, also called decision trees, (RB, RF).

Others use more sophisticated approaches like Hierachical Task Network planner (CIT) or behavior tress (ITA).

But some teams did also develop their own approaches. HBB developed a completely new approach that is somewhere between a behavior tree and a state machine. NU did a custom framework that is based on subsumption.

One team (RHO) did also use direct reinforcement learning.

It is also one of the software categories where there are the most different approaches used.

BH

State Machine

CIT

Hierachical Task Network (HTN) planner

HBB

Dynamic Stack Decider

https://www.researchgate.net/publication/350168815_DSD_-_Dynamic_Stack_Decider_A_Lightweight_Decision_Making_Framework_for_Robots_and_Software_Agents

ICH

State Machine

ITA

Behavior tree

NU

Subsumption-like approach

RHO

State Machine + Reinforcement learning

RB

Decision tree

RF

Decision tree

KMU

Finite State Machine

UA

Multiple State Machines

Hardware

Material

BH

3d printed, aluminum

CIT

Fiber glass, aluminum, TPU, PLA

HBB

Carbon fiber, stainless steel, aluminum, TPU, PLA

ICH

Aluminum, carbon fiber, stainless steel, PLA

ITA

Aluminum

NU

Aluminum (legs), 3d printed (rest)

RHO

Aluminum

RB

Aluminum

RF

Carbon fiber, aluminum, 3d printed

KMU

Aluminum

UA

3d printed

Controller Board

Many teams are still relying on old hardware from Robotis, such as the CM730. We also heard multiple complains about them, which aligns with our own experience with them.

Some teams also made their own solutions using some sort of microcontroller.

We have investigated this topic before (https://robocup.informatik.uni-hamburg.de/wp-content/uploads/2019/06/dxl_paper-4.pdf) and found a better solution (called QUADDXL) that allows much higher control rates and is less error prune. This was now also used in the new robot of CIT.

BH

CM730

CIT

Custom QUADDXL based https://github.com/citbrains/SUSTAINA-OP/tree/master/electronics/MainBoard_ver2_2

HBB

QUADDXL based https://github.com/bit-bots/wolfgang_core

ICH

CM740

ITA

CM730

NU

CM740, own controller in progress

RHO

Rhoban DXL, microprocessor based

RB

Own board, STM microprocessor based

RF

Own board, soon open source

KMU

Own board, microprocessor based

UA

U2D2

Camera

Most teams are using a Logitech webcam (C920 and similar ones). Some teams use industrial cameras. These have the advantage that they mostly have a global shutter and are more sensitive, allowing for a shorter shutter speed, thus reducing the blurriness of the images.

As connection USB and Ethernet are the two used options. We would like to hint to a known issue of USB3 Wifi interference https://ieeexplore.ieee.org/document/6670488

If you are interested how to integrate power over Ethernet in your robot, so that you only need a single cable for your Ethernet camera, you can have a look at our CORE board https://github.com/bit-bots/wolfgang_core

BH

C920

CIT

e-CAM50_CUNX

HBB

Basler ACE-ACA2040-35gc Ethernet + Cowa lense LM5NCL 1/1.8” 4mm C-Mount

ICH

C930

ITA

C920

NU

Flir Blackfly, fisheye, UBB3

RHO

PointGrey Blackfly GigE camera

RB

C920

RF

Lenovo, C920 soon

KMU

C920

UA

C920

Leg Backlash

One of our current hardware issues was the extreme backlash in the legs, meaning that you can move the foot a lot while the motors are stiff. Therefore, we especially asked the teams about this too.

For most teams this is also one of their largest hardware issues. The problem comes from the backlash of the servos (e.g. 0.25° for a Dynamixel XH540 or 0.33° for a Dynamixel MX-106), from deformations in the leg parts and from the leg-torso connection.

One solution is using parallel kinematics like CIT, but this results in other issues (e.g. more complicated kinematics, not able to use URDF, …).

Some teams tried to keep serial kinematics and reduce the different factors that introduce the backlash. NU used new legs made of thick aluminum together with needle roller bearings (https://au.rs-online.com/web/p/roller-bearings/2346912 and https://www.robotis.us/hn05-t101-set/?_ga=2.226272029.682397097.1657712939-1062027158.1657712938). Similarly HBB, RHO and RB use bearings in the connection between legs and torso.

Infrastructure

Programming Language

BH

C++, Python

CIT

C++, Python

HBB

C++, Python

ICH

C++, Python

ITA

C++; Python+Matlab for tools

NU

C++

RHO

C++, some Python for RL

RB

C++

RF

C++, Python

KMU

C++, Python

UA

Python

Configuration Management

An often overseen point in software is that you do not only need your own code but also an operating system including a set of installed dependencies, network configurations, USB device configurations, etc. It can easily happen that the systems of the different robots diverges from each other. Also if a system needs to set up from scratch, it may require some work.

Some teams do the complete process manually, while others clone their disk images or have some small scripts that automate the process (partially).

We found that using a system like Ansible to do this is very helpful. It makes the process reproducible, reduces human errors and documents the process (reducing the truck factor).

BH

Ansible

CIT

Partially automatic

HBB

Ansible

ICH

Script

ITA

Manual

NU

Script

RHO

Cloning image

RB

Manual

RF

Manual

KMU

Manual

UA

Manual

Getting the code on the robot

At some point we need to get our software on the robot. While this seems to be a trivial thing, we were still curious how other teams are doing it.

One often used approach (CIT, ICH, iTA, RF) is to just use git on the robot to pull the newest software version. While this seems straight forward, it poses some issues in our opinion. First, all software needs to be committed to the git before it can be tested on the robot and, second, your robot always needs access to the git (so internet access).

Some teams (NU, RHO) just copy the compiled binaries on their robot. We did something similar in the past, but since our robots have faster CPUs than our laptops, we decided on switching to copying it to the robot and building there. This is also done by BH, RB, and KMU.

While many teams just use a normal scp (ssh copy), HBB, RHO and NU use rsync with the option to only copy modified files, which optimizes the time needed for getting the code onto the robot. We also use a small script to facilitate the process. You can find it here https://github.com/bit-bots/bitbots_meta/blob/master/scripts/robot_compile.py

BH

scp

CIT

git

HBB

rsync + build on robot via custom script

ICH

git

ITA

git

NU

compile on laptop + copy

via rsync

RHO

compile on laptop + rsync

RB

Copy + build on robot

RF

git

KMU

scp + compile on robot

UA

docker

Testing and Continuous Integration

Many teams (CIT, ITA, RHO, RB, RF, KMU) don’t use any kind of automated testing system or CI. The most commonly used approach are Github Actions (ICH, NU, UA).

BH

CI for unit tests

CIT

No testing or CI

HBB

Had some tests and CI for previous ROS 1 version but not transferred to new ROS 2 version yet. Jenkins was used as platform

ICH

Github Actions

ITA

No testing or CI

NU

Building and tests with Github Actions

RHO

No CI but manual testing script for robot hardware

RB

No testing or CI

RF

No testing or CI

KMU

No testing or CI

UA

Github Actions

Usage of ROS

ROS seems to have a continuous rise in usage in the Humanoid League. This follows observations that were already done in other leagues (e.g. @Home, rescue), whereby now the large majority of teams use it. Additionally, there is now the question if teams are still using ROS 1 which is now in its last version or have already switched to ROS 2.

We just switched from ROS 1 to ROS 2 and have identified an array of issues, which are probably coming from the fact that it has not been used a lot for humanoid robots until now. But we will create another article about that.

Interesting to see is that some teams only use ROS for logging or debugging. This is probably a good approach to profit from some of ROS strongest advantages without needing to rewrite a lot of the own code base.

BH

ROS 2 Foxy

CIT

Only for logging

HBB

ROS 2 Rolling

ICH

No but plan on using it

ITA

ROS 1 but are switching to ROS 2

NU

No, have their own system Nuclear

RHO

No

RB

ROS 1

RF

ROS 2 Foxy

KMU

ROS 2

UA

ROS 1

Operating System

Almost all teams use some version of Ubuntu on their robot. The only exception is NU that use a custom Arch Linux.

No team uses a real time kernel. But we heard from multiple teams that they were having issues with their motions (especially the walking) not running smoothly when the whole software stack was started. In our experience this is an issue with the default Linux scheduler which will move your motion processes around between cores and interrupt it to run other processes. Some teams try to avoid this by setting the nice level of the important processes. For us this was not enough to solve the problem. We assign our processes to different CPU cores via “taskset” and make sure that the scheduler does not use these by setting “isolcpus”. This approach worked well for us.

We will explore the usage of real time kernels in the future.

Organization

We also asked how teams are organizing themselves. Some teams (CIT, HBB, ITA, NU, RB, RF, UA) do weekly meetings for organization and/or working. Almost all teams are using Git (with GitHub being the most commonly used provider). The only exception is RB that uses Slack to exchange their code between each other.

FacebooktwitterredditpinterestlinkedinmailFacebooktwitterredditpinterestlinkedinmail