Interesting NeuroAI/CompNeuro/LLM Cognition/Embodied AI/miscellaneous papers

26 minute read

Published:

1.29.2024

Neural tuning and representational geometry, Nature Reviews Neuroscience, 2021 Nikolaus Kriegeskorte & Xue-Xin Wei

1.30.2024

Spatially embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings, Nature Machine Intelligence, 2023 Jascha Achterberg et al.

2.1.2024

Transformer as a hippocampal memory consolidation model based on NMDAR-inspired nonlinearity, NeurIPS, 2023

2.5.2024

Brains and algorithms partially converge in natural language processing, Communications Biology, 2022

2.7.2024

No Coincidence, George: Capacity-Limits as the Curse of Compositionality, PsyArXiv, 2022

2.12.2024

Structural constraints on the emergence of oscillations in multi-population neural networks, eLife, 2024

Oscillatory neural networks, YouTube

2.14

Dynamics of Sparsely Connected Networks of Excitatory and Inhibitory Spiking Neurons

2.16

Using large language models to study human memory for meaningful narratives

Mechanisms of Gamma Oscillations

2.17

A call for embodied AI

2.18

Circular and unified analysis in network neuroscience

2.20-2.27

I was at AAAI 2024 for nearly a week. I learned a lot and will share some papers I came across from talks/posters at the conference.

On the Paradox of Learning to Reason from Data

CRAB: Assessing the Strength of Causal Relationships Between Real-World Events

Passive learning of active causal strategies in agents and language models

SPARTQA: A Textual Question Answering Benchmark for Spatial Reasoning

Hallucination is Inevitable: An Innate Limitation of Large Language Models

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

3.1

Three aspects of representation in neuroscience

Redefining "Hallucination" in LLMs: Towards a psychology-informed framework for mitigating misinformation

Distributed representations of words and phrases and their compositionality

3.2

Neural Turing Machines

A Critical Review of Causal Reasoning Benchmarks for Large Language Models

3.3

Recurrent Models of Visual Attention

Massive Activations in Large Language Models

Multiple Object Recognition with Visual Attention

Attention is not all you need anymore

The Annotated Transformer

Attention and Memory in Deep Learning

3.7

Large language models surpass human experts in predicting neuroscience results

3.8

Encoding and decoding in fMRI

My favorite math jokes

3.9

Memory in humans and deep language models: Linking hypotheses for model augmentation

3.11

Are Emergent Abilities of Large Language Models a Mirage?

Mathematical introduction to deep learning

3.12

Memory and attention in deep learning

World Models and Predictive Coding for Cognitive and Developmental Robotics: Frontiers and Challenges

Mastering Memory Tasks with World Models

Mechanism for feature learning in neural networks and backpropagation-free machine learning models

3.13

Brain-inspired intelligent robotics: The intersection of robotics and neuroscience

Papers mentioned in this article

3.14

One model for the learning of language

3.15

The pitfalls of next-token prediction

3.16

Do Llamas Work in English? On the Latent Language of Multilingual Transformers

Using large language models to study human memory for meaningful narratives

3.18

Neuroscience needs behavior

3.23

Traveling waves shape neural population dynamics enabling predictions and internal model updating

Task interference as a neuronal basis for the cost of cognitive flexibility

A Technical Critique of Some Parts of the Free Energy Principle

3.24

Theories of Error Back-Propagation in the Brain

Neurosymbolic AI

3.26

Spatially embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings

Traveling waves shape neural population dynamics enabling predictions and internal model updating

3.27

Reconstructing computational system dynamics from neural data with recurrent neural networks

3.29

A useful guide of how to pronounce common math symbols

3.30

A Review of Neuroscience-Inspired Machine Learning

3.31

Collective intelligence: A unifying concept for integrating biology across scales and substrates

4.3

An Introduction to Model-Based Cognitive Neuroscience

What does it mean to understand a neural network?

What is a GPT by 3Blue1Brown

4.5

Nonmonotonic Plasticity: How Memory Retrieval Drives Learning

Single Cortical Neurons as Deep Artificial Neural Networks

4.17

The brain's unique take on algorithms

Cognition is an emergent property

4.18

Catalyzing next-generation Artificial Intelligence through NeuroAI

4.19

Toward a formal theory for computing machines made out of whatever physics offers

Natural and Artificial Intelligence: A brief introduction to the interplay between AI and neuroscience research

4.22

Time, Love, Memory

Thinking About Science

Reasoning ability is (little more than) working-memory capacity?! - ScienceDirect

What Is Life - Wikipedia

How do Large Language Models Handle Multilingualism?

4.24

Empowering Working Memory for Large Language Model Agents

4.26

Context-dependent computation by recurrent dynamics in prefrontal cortex

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

4.29

Concurrent maintenance of both veridical and transformed working memory representations within unique coding schemes

5.1

A formal model of capacity limits in working memory - ScienceDirect

The Thermodynamics of Mind: Trends in Cognitive Sciences

5.7

Bridging Neuroscience and Robotics: Spiking Neural Networks in Action

Combined Sensing, Cognition, Learning, and Control for Developing Future Neuro-Robotics Systems: A Survey

AI, Robotics & Neuroengineering at Ken Kennedy Institute

Special Issue : Applications of Neural Networks in Robot Control

Embodied AI Workshop

5.8

Efficiently Modeling Long Sequences with Structured State Spaces

A new look at state-space models for neural data, Journal of Computational Neuroscience

Latent state-space models for neural decoding

State Space Modeling of Neural Spike Train and Behavioral Data - ScienceDirect

Switching state-space modeling of neural signal dynamics

Robotics and artificial intelligence

Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT

5.13

Is it a transition or a continuation? From PhD student to Postdoc. - ECR Community

Ten Simple Rules for Selecting a Postdoctoral Position

Transitioning fields between a Ph.D. and postdoc

5.14

The Computational Lens: from Quantum Physics to Neuroscience

Integration of cognitive tasks into artificial general intelligence test for large models: iScience

Active Predictive Coding: A Unified Neural Framework for Learning Hierarchical World Models for Perception and Planning

From grid cells to place cells: A mathematical model

If deep learning is the answer, what is the question?

5.21

The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers

5.29

Testing theory of mind in large language models and humans

Neuromorphic dreaming: A pathway to efficient learning in artificial agents

6.2

Do Llamas Work in English? On the Latent Language of Multilingual Transformers

6.3

Biocomputing with organoid intelligence

Catalyzing next-generation Artificial Intelligence through NeuroAI (Well, this one has been listed above, but never mind)

Disentangling and Integrating Relational and Sensory Information in Transformer Architectures

6.5

Empirical influence functions to understand the logic of fine-tuning

6.12

Are Emergent Abilities of Large Language Models a Mirage?

6.13

A virtual rodent predicts the structure of neural activity across behaviors

Empirical influence functions to understand the logic of fine-tuning

Activation Sparsity: An Insight into the Interpretability of Trained Transformers

6.14

Inferences on a multidimensional social hierarchy use a grid-like code

Grid-like and distance codes for representing word meaning in the human brain

Relating transformers to models and neural representations of the hippocampal formation

Scaling Laws for Neural Language Models

Emergent Abilities of Large Language Models

Organizing conceptual knowledge in humans with a gridlike code

The coming decade of digital brain research: A vision for neuroscience at the intersection of technology and computing

6.18

Thousand Brains Project

千脑智能理论:开启创造机器智能的路线图

6.24

Oxford ML School

Oxford LLMs

Large Language Models for Mathematicians

6.25

Language is primarily a tool for communication rather than thought

Representation learning for neural population activity with Neural Data Transformers

Towards a Foundation Model of the Mouse Visual Cortex

Statistical mechanics of Bayesian inference and learning in neural networks

Jascha Achterberg - NeuroAI

6.26

A Bayesian account of learning and generalising representations in the brain - ORA - Oxford University Research Archive

Detecting hallucinations in large language models using semantic entropy

Fine-tuning can cripple your foundation model; preserving features may be the solution

7.12

Working Memory Load Modulates Neuronal Coupling

In vivo ephaptic coupling allows memory network formation

7.16

Cognitive computational neuroscience

Heavy-tailed neuronal connectivity arises from Hebbian self-organization

INSTRUCTION-TUNING ALIGNS LLMS TO THE HUMAN BRAIN

The debate over understanding in AI’s large language models

7.18

Shared functional specialization in transformer-based language models and the human brain

On Layer Normalization in the Transformer Architecture

7.19

The expanding horizons of network neuroscience: From description to prediction and control - ScienceDirect

Modular Brain Networks

7.31

Organic electrochemical neurons and synapses with ion mediated spiking

8.2

Stephen Wolfram: A New Kind of Science

8.3

Do Language Models Have a Critical Period for Language Acquisition?

8.5

Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs

8.7

From Analog to Digital Computing: Is Homo sapiens’ Brain on Its Way to Become a Turing Machine?

8.13

The brain and its time: intrinsic neural timescales are key for input processing

8.28

Neural circuits as computational dynamical systems

9.9

Unsupervised neural network models of the ventral visual stream

Emotional Intelligence of Large Language Models

No Free Lunch from Deep Learning in Neuroscience: A Case Study through Models of the Entorhinal-Hippocampal Circuit

CEBRA: Learnable latent embeddings for joint behavioral and neural analysis

DevBench: A multimodal developmental benchmark for language learning

Running cognitive evaluations on large language models: The do's and the don'ts

Induction heads - illustrated — LessWrong

Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

Abstract representations emerge in human hippocampal neurons during inference

Reconciling Shared versus Context-Specific Information in a Neural Network Model of Latent Causes

Lecture Notes on Infinite-Width Limits of Neural Networks

Scaling and renormalization in high-dimensional regression

Curriculum Learning with Infant Egocentric Videos

In-context Learning and Induction Heads

Natural and Artificial Intelligence: A brief introduction to the interplay between AI and neuroscience research - ScienceDirect

Reasoning ability is (little more than) working-memory capacity?!

A formal model of capacity limits in working memory

Prefrontal cortex as a meta-reinforcement learning system

Scaffolding cooperation in human groups with deep reinforcement learning

Sequential Memory with Temporal Predictive Coding

COGNITIVE MODELING OF SEMANTIC FLUENCY USING TRANSFORMERS

Predictive Coding: a Theoretical and Experimental Review

Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes

Toward the Emergence of Intelligent Control: Episodic Generalization and Optimization

Machine learning Notation

Accelerating generative models and nonconvex optimisation

Representation and computation in visual working memory

Nonlinear difference equations

Dynamical Systems Approaches to Cognition

Attention Mechanisms and Their Applications to Complex Systems - PMC

Learning differential equations

Context-dependent computation by recurrent dynamics in prefrontal cortex

Evidence of a predictive coding hierarchy in the human brain listening to speech

Using higher-order Markov models to reveal flow-based communities in networks

he Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

The neuron as a direct data-driven controller

Seminar course: Bridging Language in Machines and Language in the Brain

Manifolds: A Gentle Introduction

Dimension Reduction using Isomap

A simple weight decay can…

Investigating Neuron Ablation in Attention Heads: The Case for Peak Activation Centering

Empirical influence functions to understand the logic of fine-tuning

9.14

The Impact of Positional Encoding on Length Generalization in Transformers

9.15

How Attention works in Deep Learning: understanding the attention mechanism in sequence models

Explainable AI: Visualizing Attention in Transformers - Comet

Toy Models of Superposition

A Mathematical Framework for Transformer Circuits

Transformers: a Primer

Code repo: Word-level Language Modeling using RNN and Transformer

Code repo: Transformer as a hippocampal memory consolidation model based on NMDAR-inspired nonlinearity

Code repo: The Tolman-Eichenbaum Machine

Adaptive chunking improves effective working memory capacity in a prefrontal cortex and basal ganglia circuit

Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

Analogous computations in working memory input, output and motor gating: Electrophysiological and computational modeling evidence

Exposing Attention Glitches with Flip-Flop Language Modeling

Opening the Black Box: Low-Dimensional Dynamics in High-Dimensional Recurrent Neural Networks

Context-dependent computation by recurrent dynamics in prefrontal cortex

A survey of transformers

Gradient-based learning drives robust representations in recurrent neural networks by balancing compression and expansion

Population codes enable learning from few examples by shaping inductive bias

What is In-context Learning, and how does it work: The Beginner’s Guide

The Curse of Dimensionality

A generative model of memory construction and consolidation

The Neurobiology of Semantic Memory

The precision of visual working memory is set by allocation of a shared resource

The capacity of visual working memory for features and conjunctions

Timescales of learning in prefrontal cortex

The Distributed Nature of Working Memory

Geometry of neural computation unifies working memory and planning

Transformer Mechanisms Mimic Frontostriatal Gating Operations When Trained on Human Working Memory Tasks

Wide Attention Is The Way Forward For Transformers?

The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

Natural constraints explain working memory capacity limitations in sensory-cognitive models

Scaling Laws for Neural Language Models

The Depth-to-Width Interplay in Self-Attention

A mathematical perspective on Transformers

TOWARDS SMALLER, FASTER DECODER-ONLY TRANSFORMERS: ARCHITECTURAL VARIANTS AND THEIR IMPLICATIONS

Upper and lower memory capacity bounds of transformers for next-token prediction

How Powerful are Decoder-Only Transformer Neural Models?

Mastering Decoder-Only Transformer: A Comprehensive Guide

How should the architecture of a transformer be scaled? : r/MachineLearning

Code repo: Transformer_walkthrough

PsychRNN: An Accessible and Flexible Python Package for Training Recurrent Neural Network Models on Cognitive Tasks

Self-backpropagation of synaptic modifications elevates the efficiency of spiking and artificial neural networks

9.17

STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making

Automated construction of cognitive maps with visual predictive coding

Schrodinger's Memory: Large Language Models

From Cognition to Computation: A Comparative Review of Human Attention and Transformer Architectures

Neuroscience + Artificial Intelligence = NeuroAI

Transformer-based Working Memory for Multiagent Reinforcement Learning with Action Parsing

[2402.12875] Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

[2103.03404] Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth

9.18

InversionView: A General-Purpose Method for Reading Information from Neural Activations

Divergent recruitment of developmentally defined neuronal ensembles supports memory dynamics

Theoretical Limitations of Self-Attention in Neural Sequence Models

TransformerFAM: Feedback attention is working memory

A resource-rational model of human processing of recursive linguistic structure

9.19

Empirical Capacity Model for Self-Attention Neural Networks

Self-attention Does Not Need $O(n^2)$ Memory

9.20

Imitating and exploring the human brain's resting and task-performing states via brain computing: scaling and architecture

Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems

Emulating the Attention Mechanism in Transformer Models with a Fully Convolutional Network

9.21

Human-like systematic generalization through a meta-learning neural network

9.25

The Dimensions of dimensionality

RNNs Implicitly Implement Tensor Product Representations

Tensor product variable binding and the representation of symbolic structures in connectionist systems

9.26

A shared model-based linguistic space for transmitting our thoughts from brain to brain in natural conversations: Neuron

A Quantitative Approach to Predicting Representational Learning and Performance in Neural Networks

Mechanistic Interpretability for AI Safety – A Review

9.28

Flexible control of sequence working memory in the macaque frontal cortex

Mental programming of spatial sequences in working memory in the macaque frontal cortex, Science, 2024

Geometry of sequence working memory in macaque prefrontal cortex, Science, 2022

Nonlinear classification of neural manifolds with contextual information

9.29

A theory of consciousness from a theoretical computer science perspective: Insights from the Conscious Turing Machine

[2012.14601] Emergent Symbols through Binding in External Memory

[2001.11027] The Tensor Brain: Semantic Decoding for Perception and Memory

Network attractors and nonlinear dynamics of neural computation - ScienceDirect

An attractor network in the hippocampus: Theory and neurophysiology

Learning Attractor Dynamics for Generative Memory

What is remembered? Role of attention on the encoding and retrieval of hippocampal representations - PMC

Attractor dynamics with activity-dependent plasticity capture human working memory across time scales

attractor networks

Acetylcholine-mediated top-down attention improves the response to bottom-up inputs by deformation of the attractor landscape

The Consciousness Prior

Bayesian surprise attracts human attention

[1902.10186] Attention is not Explanation

[1908.04626] Attention is not not Explanation

From Human Attention to Computational Attention: A Multidisciplinary Approach

Disentangling and Integrating Relational and Sensory Information in Transformer Architectures

Accurate Path Integration in Continuous Attractor Network Models of Grid Cells

A map of spatial navigation for neuroscience - ScienceDirect

Viewpoints: how the hippocampus contributes to memory, navigation and cognition - PMC

[1805.09042] Generalisation of structural knowledge in the hippocampal-entorhinal system

Can We Reconcile the Declarative Memory and Spatial Navigation Views on Hippocampal Function?: Neuron

Using Fast Weights to Attend to the Recent Past

Hopfield Networks is All You Need

R-Transformer: Recurrent Neural Network Enhanced Transformer

Abstract representations emerge in human hippocampal neurons during inference

10.2

PhD Thesis: Exploring the role of (self-)attention in cognitive and computer vision architecture

10.6

A rubric for human-like agents and NeuroAI

Memory Networks: Towards Fully Biologically Plausible Learning

Textbook: Introduction to Machine Learning

10.7

Language in Brains, Minds, and Machines

10.8

Statistical Mechanics of Deep Learning

It’s about time: Linking dynamical systems with human neuroimaging to understand the brain

RNNs implicitly implement tensor-product representations

Tensor product decomposition network

Basic Reasoning with Tensor Product Representations

10.9

Were RNNs All We Needed?

Human-level control through deep reinforcement learning

Geometric constraints on human brain function

The brain wave equation: a model for the EEG

Traveling Waves Encode the Recent Past and Enhance Sequence Learning

10.10

Probability theory notes

Tensor Decomposition via Variational Auto-Encoder

10.11

Neural knowledge assembly in humans and neural networks: Neuron

Reproducibility in Computational Neuroscience Models and Simulations

Distributed Representations of Words and Phrases and their Compositionality

GloVe: Global Vectors for Word Representation

Learning by thinking in natural and artificial minds: Trends in Cognitive Sciences

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Intelligence at the Edge of Chaos

10.13

Why the simplest explanation isn’t always the best

10.14

Meta Movie Gen

Interpretable Recurrent Neural Networks in Continuous-time Control Environments

From Liquid Neural Networks to Liquid Foundation Models

Humans actively reconfigure neural task states

OptPDE: Discovering Novel Integrable Systems via AI-Human Collaboration

Loss of plasticity in deep continual learning

10.15

The relational bottleneck as an inductive bias for efficient abstraction: Trends in Cognitive Sciences

Turning large language models into cognitive models

Comunication-Efficient Algorithms for Statistical Optimization

Compositionality Decomposed: How do Neural Networks Generalise?

Visualisation and ‘Diagnostic Classifiers’ Reveal How Recurrent and Recursive Neural Networks Process Hierarchical Structure

NeuroAI critique: What have we learned about artificial intelligence from studying the brain?

Tackling the Abstraction and Reasoning Corpus with Vision Transformers: the Importance of 2D Representation, Positions, and Objects

Nature Portfolio collection: Nobel Prize in Physics 2024

Physics-Informed Regularization for Domain-Agnostic Dynamical System Modeling

Everything You Wanted To Know About Mathematics

10.16

Brain in a Vat: On Missing Pieces Towards Artificial General Intelligence in Large Language Models

Toward a realistic model of speech processing in the brain with self-supervised learning

Neuronal sequences in population bursts encode information in human cortex

10.17

When Does Perceptual Alignment Benefit Vision Representations?

10.18

Diffusion Models are Evolutionary Algorithms

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

Towards a Definition of Disentangled Representations

Donders is dead: cortical traveling waves and the limits of mental chronometry in cognitive neuroscience

10.19

Physics-informed machine learning

10.20

Not-So-CLEVR: learning same–different relations strains feedforward neural networks

Hierarchical Working Memory and a New Magic Number

Measuring abstract reasoning in neural networks

RAVEN: A Dataset for Relational and Analogical Visual rEasoNing

Interpretable statistical representations of neural population dynamics and geometry

An Explicitly Relational Neural Network Architecture

Neural mechanisms of binding in the hippocampus and neocortex: Insights from computational models

Tensor product variable binding and the representation of symbolic structures in connectionist systems

Is Activity Silent Working Memory Simply Episodic Memory?: Trends in Cognitive Sciences

Discovering neural policies to drive behaviour by integrating deep reinforcement learning agents with biological neural networks

A Neurodynamical Model of Visual Attention: Feedback Enhancement of Spatial Resolution in a Hierarchical System

Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape

10.21

Possible principles for aligned structure learning agents

“I Am the One and Only, Your Cyber BFF”: Understanding the Impact of GenAI Requires Understanding the Impact of Anthropomorphic AI

Probabilistic Principles for Biophysics and Neuroscience: Entropy Production, Bayesian Mechanics & the Free-Energy Principle

10.22

Some properties of an associative memory model using the Boltzmann machine learning

Synaptic Mechanisms and Network Dynamics Underlying Spatial Working Memory in a Cortical Network Model

Statistical Mechanics, Neural Networks, and Artificial Intelligence: Using Powerful Brain Strategies to Improve AI

Continuous attractors for dynamic memories

Artificial Neural Networks for Neuroscientists: A Primer

Deep Boltzmann Machines

optimal perceptual inference

Notes on Quadratic Forms

On the Measure of Intelligence

Position: Maximizing Neural Regression Scores May Not Identify Good Models of the Brain

Artificial and Natural Intelligence: From Invention to Discovery

A Phenomenological AI Foundation Model for Physical Signals

10.23

Vector Calculus Notes

Generating realistic neurophysiological time series with denoising diffusion probabilistic models

What Matters in Transformers? Not All Attention is Needed

Can the brain do backpropagation?—exact implementation of backpropagation in predictive coding networks

Looking Inward: Language Models Can Learn About Themselves by Introspection

Modern Quantum Mechanics.pdf

Adaptation in Natural and Artificial Systems

Building machines that learn and think with people

Network model with internal complexity bridges artificial intelligence and neuroscience

10.25

Observing Schrödinger’s Cat with Artificial Intelligence: Emergent Classicality from Information Bottleneck

Deep learning and renormalization group

Agents Thinking Fast and Slow: A Talker-Reasoner Architecture

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Robust agents learn causal world models

10.26

Trainees’ perspectives and recommendations for catalyzing the next generation of NeuroAI researchers

The new NeuroAI

Observing Schrödinger’s Cat with Artificial Intelligence: Emergent Classicality from Information Bottleneck

Information decomposition and the informational architecture of the brain

The Matrix Calculus You Need For Deep Learning

explained.ai

10.27

Averaging is a convenient fiction of neuroscience

Future views on neuroscience and AI

LeanAgent: Lifelong Learning for Formal Theorem Proving

10.28

Interpretable Recurrent Neural Networks in Continuous-time Control Environments

The physics of representation

10.29

A prescriptive theory for brain-like inference

10.30

Toward a realistic model of speech processing in the brain with self-supervised learning

Supervised Fine-tuning: customizing LLMs

Continuous Attractor Neural Networks: Candidate of a Canonical Model for Neural Information Representation

Cross-Entropy Is All You Need To Invert the Data Generating Process

Toward Generalizing Visual Brain Decoding to Unseen Subjects

10.31

Towards Understanding Grokking: An Effective Theory of Representation Learning

Theoretical Limitations of Self-Attention in Neural Sequence Models

Representational Strengths and Limitations of Transformers

Understanding Transformer Reasoning Capabilities via Graph Algorithms

Transformers, parallel computation, and logarithmic depth

The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning

Statistical mechanics of complex neural systems and high dimensional data

Centaur: a foundation model of human cognition

11.1

The Geometry of Concepts: Sparse Autoencoder Feature Structure

11.2

A neural machine code and programming framework for the reservoir computer

11.3

Internal world models in humans, animals, and AI

Large Language Models as Markov Chains

The Ghost in the Quantum Turing Machine

Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis

11.4

Infinite Powers: How Calculus Reveals the Secrets of the Universe

Foundation model of neural activity predicts response to new stimulus types and anatomy

The little book of deep learning

11.5

So You Want to Be a Physicist: A 22 Part Guide

In search of problems: Mu-Ming Poo

11.6

Letter on: A Natural AI Based on The Science of Computational Physics, Biology and Neuroscience: Policy and Societal Significance

On Neural Differential Equations

Neural Ordinary Differential Equations

11.7

Imagining and building wise machines: The centrality of AI metacognition

Understanding cognitive processes across spatial scales of the brain

11.8

Human-level play in the game of Diplomacy by combining language models with strategic reasoning

Learning to Reason with Third-Order Tensor Products

Offline ensemble co-reactivation links memories across days

Physical computing: a category theoretic perspective on physical computation and system compositionality

Book Of Proof

A cellular basis for mapping behavioural structure

Computational role of structure in neural activity and connectivity

Visual attention methods in deep learning: An in-depth survey

11.9

Beyond networks, towards adaptive systems

Artificial Intelligence, Scientific Discovery, and Product Innovation

Certified Deductive Reasoning with Language Models

Scaling Continuous Latent Variable Models as Probabilistic Integral Circuits

PhD Thesis: Exploring the role of (self-)attention in cognitive and computer vision architecture

Contrasformer: A Brain Network Contrastive Transformer for Neurodegenerative Condition Identification

11.10

Why Is Anything Conscious?

Is Complexity an Illusion?

The Surprising Effectiveness of Test-Time Training for Abstract Reasoning

11.11

The Self-Assembling Brain: How Neural Networks Grow Smarter

11.13

Spatial embedding promotes a specific form of modularity with low entropy and heterogeneous spectral dynamics

The Geometry of Concepts: Sparse Autoencoder Feature Structure

Single cortical neurons as deep artificial neural networks

Analysis methods for large-scale neuronal recordings

11.14

On principles of emergent organization - ScienceDirect

Evidence for Quasicritical Brain Dynamics

Do language models need sensory grounding for meaning and understanding? – NYU Center for Mind, Brain, and Consciousness

11.16

“Ilya’s Machine Learning Reading List”

Tracking the topology of neural manifolds across populations

A sequence bottleneck for animal intelligence and language? - ScienceDirect

Mental search of concepts is supported by egocentric vector representations and restructured grid maps

Handbook of Mathematics

Neural waves and computation in a neural net model II: Data-like structures and the dynamics of episodic memory

Probabilistic Machine Learning

11.17

Testing theory of mind in large language models and humans

11.18

Knowledge Mechanisms in Large Language Models: A Survey and Perspective

Shared Representational Geometry Across Neural Networks

Geometric Methods for Sampling, Optimisation, Inference and Adaptive Agents

Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval

Rethinking Softmax: Self-Attention with Polynomial Activations

11.19

What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?

Reasoning = working memory \neq attention

Why you don’t overfit, and don’t need Bayes if you only train for one epoch

Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models