SDPG is the main contribution. It extends GRPO with an exact per-token forward KL between the actor (without privileged context) and itself conditioned on privileged context c: ...
Abstract: Control barrier functions (CBFs) are powerful tools for ensuring safety in controlled systems, commonly employed through the construction of a safety filter using quadratic programming (QP), ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results