NeuroAI
PhD Student, Yale University, 2024
- Using neural-network models such as RNNs and Transformers to gain mechanistic understandings of psychological phenomena related to attention and memory: How does attention facilitate memory encoding? How is working memory transformed into long-term memory? Why is there a fundamental limit in human working memory capacity? Are working memory and long-term memory represented in the same neural architecture? How do the representational formats of working memory and long-term memory differ? How to explain memory retrieval and false memory?
- Using theoretical and mathematical tools to understand the inner workings and limitations of Transformer-based large language models (LLMs): How to use formal mathematical tools to assess the reasoning ability of LLMs? How to define the “emergence” of cognitive abilities in LLMs through the lens of statistical physics? What are the unique mathematical properties of LLMs that make them powerful? What are the missing pieces in these models to achieve human-level cognitive abilities? How to mathematically define the amount of intelligence and sentience in deep neural networks like Transformer-based LLMs? If LLMs are not the final solution to artificial general intelligence, what insights can theoretical neuroscience offer on the way of achieving that goal?