Google’s Gemma series continues to throw up all kinds of interesting models. The latest is Magenta RealTime 2 (MRT2), an open-weights model ...
SDPG is the main contribution. It extends GRPO with an exact per-token forward KL between the actor (without privileged context) and itself conditioned on privileged context c: ...