1. Tool title* (50 characters) Please give a brief title for your tool. Please see the titles of previous winners for more guidance. Ira Project 2. What is your tool? (200 characters)* Just a one-liner. You’ll give a more detailed description below. Learning platform with scaffolded activities that empower students to build skills beyond AI capabilities 3. Proposal abstract (3000 characters)* Describe your proposed tool. We want to understand how it works, how it reaches the learner populations and learning outcomes relevant to your track, and how the technology functions. Our proposed tool is an innovative educational platform designed to transform learning processes by using AI both as a facilitator and as a benchmark. The tool operates on two primary principles — firstly, it leverages AI to enhance learning for skills that AI excels at, and secondly, it encourages learners to build skills beyond AI’s capabilities. For tasks and skills that AI excels in, our platform will provide an innovative learning experience where the AI will act as a thinking partner to help learners navigate complex concepts. Instead of solving a set of questions, learners start by teaching our AI, Ira, the required concepts. Ira then uses the learner’s explanation to attempt the questions and shows it’s working. The learners have to iterate and refine their explanations till Ira is able to solve all the given questions. Ira's responses and working are designed to serve as scaffolding to guide the learners to the correct conceptual understanding. This process not only hones the learner’s expertise but also enhances skills like communication and organisational abilities. To help learners build skills beyond AI’s capabilities, we will present them with a novel type of assessment that the state-of-the-art LLMs perform poorly on. We will generate a set of incorrect answers and ask learners to identify the line of reasoning that would produce each of those incorrect answers. This assessment will be designed to be within the bounds of the curriculum and will require the learners to employ higher-order cognitive skills, like creativity and critical thinking. Ira uses a proprietary reasoning engine that interfaces with an LLM. When an explanation is submitted by a learner, the LLM processes it to make a call to the reasoning engine. After the reasoning engine attempts a question, it passes its output back to the LLM which then simulates a response to the learner. The reasoning engine is built separately for each topic, and will be used for both learning experiences. This framework facilitates a dialogical learning experience, prevents hallucinations, and ensures adherence to the pre-defined constraints of the topic. To reach the learner populations, our tool can seamlessly integrate into the common Learning Management Systems (LMS) used by schools. We also offer a standalone version of the tool that students and teachers can directly access online (iraproject.com), thus making it available to diverse learning environments. We are currently running pilots at 6 different schools across the globe and plan to introduce an affordable subscription-based pricing model for educational institutions. 4. What is your approach to Learning Engineering? (1000 characters)* Tell us about the learning data that your tool will collect and how it is designed for research and continuous improvement. We want to understand how researchers can use the data to better understand how students learn. Our tool can be used to assess the large-scale pedagogical efficacy of the Feynman Technique, which posits that teaching others is one of the most effective ways to learn. This technique has never been implemented at scale due to its reliance on personalised and expert-driven (instructors or peers) interactions. To address this, we have developed an AI peer, Ira, for each student to teach and obtain immediate feedback from. The learning data collected from our tool would include both student explanations over time and percentage of questions that Ira solves using each explanation. By analyzing this data, we can obtain a comprehensive quantitative measure of a learner’s understanding at any given time, map this understanding to specific learning outcomes, and predict the improvement in the understanding based on the degree and type of scaffolding provided. Using a pretest-posttest study, our tool can also be used to determine the effectiveness of AI-driven Feynman Technique. 5. How can your tool scale? (1000 characters)* Please summarize how your tool can grow rapidly in new contexts, markets, and/or populations. If there are any significant cost considerations required to scale your tool (e.g., hardware) please note them in this section. Our tool can easily be used across diverse contexts, markets, and populations due to its adaptable framework and digital delivery. Leveraging the capabilities of an LLM allows our platform to easily cater to a wide range of languages and curriculums. It would also allow us to develop the capacity for multimodal explanations, including speech, drawings, videos, etc. To provide our tool at scale, we have to consider the computation costs associated with an AI inference engine. Since the reasoning engine is tailored to a topic, we would also need to invest in a team of STEM experts that can rapidly build out these engines for a wide range of curriculums. We are hoping to have educational institutions as design partners to help mitigate some of these costs. 6. Why is this the right team? (750 characters)* Please highlight content or technical expertise related to your team. Vignesh has always been committed to improving education, as evidenced by his decision to start a non-profit, Vismaya Kalike, right out of college. Today, the non-profit runs 10 learning spaces for marginalized children across Bangalore. His experience of seven years as an educator and his technical background shape the vision for our tool. Likhit's expertise in AI research in low-resource settings and his global teaching experience further strengthen the team's ability to design learning processes for diverse educational settings. Together, our combined expertise make us the best-equipped to develop and iterate on the Ira Project and to ensure that it remains challenging and relevant even as LLMs continue to improve. #### Dataset Prize Please describe your dataset. (2000 characters)* Be sure to include (a) the topic area (i.e., educational focus – math, ELA, student engagement, absenteeism, language acquisition, etc.), (b) relevant research question(s) the data can help respond to, (c) the target populations of the data and how they are represented in the data, and (d) potential data size (number of samples) We plan to use our tool to create two different datasets. The first dataset will be collected using Ira as an AI peer that the learner has to teach. For each learner, this dataset will consist of a written explanation given by the learner for a set of questions, the answer to each of these questions as solved by Ira, and the answer to each of these questions as solved directly by the learner. Ira's answer will show the working and if incorrect, it will also respond with Socratic dialogue that will prompt the learner to provide a new explanation. The dialogue serves as a scaffolding that will also be captured in the dataset. We propose to answer three key research questions using this dataset: 1) How to design assessments that test for conceptual knowledge (defined by the knowledge required to teach how to solve a set of questions) as opposed to procedural knowledge (defined by the knowledge required to solve a set of questions)? 2) How can Socratic dialogue be used to scaffold and personalize learning based on prior knowledge? 3) Given just a learner's explanation, how can we predict the questions that the learner won't be able to solve correctly and why? We propose to collect explanations across 10 different topics in Physics and Mathematics from 1,000 students in grades 9-12. For each student, we will collect 3 explanations (based on different levels of scaffolding) per topic, creating a dataset of 30,000 explanations. The second dataset will consist of a unique test design 10 topics in high school (grades 9-12) Mathematics and Physics. For a question and a given incorrect answer, the test-taker (LLM or student) has to predict the reasoning required to reach that specific answer. This unique dataset will serve two purposes: 1) It will be a new benchmark for evaluating the performance of LLMs. 2) It can be used to design learning activities that allow students to build skills beyond AI capabilities.