To be continued.
Memory Palace
Outer Alignment
Reward misspecification, specification gaming, RLHF, and the gap between intended objectives and operationalized training signals.
Reward misspecification, specification gaming, and operationalized training signals.