- Instruction retention - instructions from first user message followed throughout the conversation - Inference memory - connects relevant details across the conversation when it's implicitly required later - Reliable versioned editing - checks if LLMs can help humans revise existing material through back and forth iterations - Self-coherence - avoids sycophancy and unconditionally agreeing to human users --- [[MultiChallenge - A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs]]