- Instruction retention - instructions from first user message followed throughout the conversation
- Inference memory - connects relevant details across the conversation when it's implicitly required later
- Reliable versioned editing - checks if LLMs can help humans revise existing material through back and forth iterations
- Self-coherence - avoids sycophancy and unconditionally agreeing to human users
---
[[MultiChallenge - A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs]]