Rectifying Shortcut Behaviors in Preference-Based Reward Learning arxiv.org 1 points by PaulHoule 5 hours ago