Interesting NeuroAI/CompNeuro/LLM Cognition/Embodied AI/miscellaneous papers
Published:
1.29.2024
Neural tuning and representational geometry, Nature Reviews Neuroscience, 2021 Nikolaus Kriegeskorte & Xue-Xin Wei
1.30.2024
Spatially embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings, Nature Machine Intelligence, 2023 Jascha Achterberg et al.
2.1.2024
Transformer as a hippocampal memory consolidation model based on NMDAR-inspired nonlinearity, NeurIPS, 2023
2.5.2024
Brains and algorithms partially converge in natural language processing, Communications Biology, 2022
2.7.2024
No Coincidence, George: Capacity-Limits as the Curse of Compositionality, PsyArXiv, 2022
2.12.2024
Structural constraints on the emergence of oscillations in multi-population neural networks, eLife, 2024
Oscillatory neural networks, YouTube
2.14
Dynamics of Sparsely Connected Networks of Excitatory and Inhibitory Spiking Neurons
2.16
Using large language models to study human memory for meaningful narratives
Mechanisms of Gamma Oscillations
2.17
2.18
Circular and unified analysis in network neuroscience
2.20-2.27
I was at AAAI 2024 for nearly a week. I learned a lot and will share some papers I came across from talks/posters at the conference.
On the Paradox of Learning to Reason from Data
CRAB: Assessing the Strength of Causal Relationships Between Real-World Events
Passive learning of active causal strategies in agents and language models
SPARTQA: A Textual Question Answering Benchmark for Spatial Reasoning
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
3.1
Three aspects of representation in neuroscience
Distributed representations of words and phrases and their compositionality
3.2
A Critical Review of Causal Reasoning Benchmarks for Large Language Models
3.3
Recurrent Models of Visual Attention
Massive Activations in Large Language Models
Multiple Object Recognition with Visual Attention
Attention is not all you need anymore
Attention and Memory in Deep Learning
3.7
Large language models surpass human experts in predicting neuroscience results
3.8
3.9
Memory in humans and deep language models: Linking hypotheses for model augmentation
3.11
Are Emergent Abilities of Large Language Models a Mirage?
Mathematical introduction to deep learning
3.12
Memory and attention in deep learning
Mastering Memory Tasks with World Models
Mechanism for feature learning in neural networks and backpropagation-free machine learning models
3.13
Brain-inspired intelligent robotics: The intersection of robotics and neuroscience
Papers mentioned in this article
3.14
One model for the learning of language
3.15
The pitfalls of next-token prediction
3.16
Do Llamas Work in English? On the Latent Language of Multilingual Transformers
Using large language models to study human memory for meaningful narratives
3.18
3.23
Traveling waves shape neural population dynamics enabling predictions and internal model updating
Task interference as a neuronal basis for the cost of cognitive flexibility
A Technical Critique of Some Parts of the Free Energy Principle
3.24
Theories of Error Back-Propagation in the Brain
3.26
Traveling waves shape neural population dynamics enabling predictions and internal model updating
3.27
Reconstructing computational system dynamics from neural data with recurrent neural networks
3.29
A useful guide of how to pronounce common math symbols
3.30
A Review of Neuroscience-Inspired Machine Learning
3.31
Collective intelligence: A unifying concept for integrating biology across scales and substrates
4.3
An Introduction to Model-Based Cognitive Neuroscience
What does it mean to understand a neural network?
4.5
Nonmonotonic Plasticity: How Memory Retrieval Drives Learning
Single Cortical Neurons as Deep Artificial Neural Networks
4.17
The brain's unique take on algorithms
Cognition is an emergent property
4.18
Catalyzing next-generation Artificial Intelligence through NeuroAI
4.19
Toward a formal theory for computing machines made out of whatever physics offers
4.22
Reasoning ability is (little more than) working-memory capacity?! - ScienceDirect
How do Large Language Models Handle Multilingualism?
4.24
Empowering Working Memory for Large Language Model Agents
4.26
Context-dependent computation by recurrent dynamics in prefrontal cortex
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
4.29
5.1
A formal model of capacity limits in working memory - ScienceDirect
The Thermodynamics of Mind: Trends in Cognitive Sciences
5.7
Bridging Neuroscience and Robotics: Spiking Neural Networks in Action
AI, Robotics & Neuroengineering at Ken Kennedy Institute
Special Issue : Applications of Neural Networks in Robot Control
5.8
Efficiently Modeling Long Sequences with Structured State Spaces
A new look at state-space models for neural data, Journal of Computational Neuroscience
Latent state-space models for neural decoding
State Space Modeling of Neural Spike Train and Behavioral Data - ScienceDirect
Switching state-space modeling of neural signal dynamics
Robotics and artificial intelligence
5.13
Is it a transition or a continuation? From PhD student to Postdoc. - ECR Community
Ten Simple Rules for Selecting a Postdoctoral Position
Transitioning fields between a Ph.D. and postdoc
5.14
The Computational Lens: from Quantum Physics to Neuroscience
Integration of cognitive tasks into artificial general intelligence test for large models: iScience
From grid cells to place cells: A mathematical model
If deep learning is the answer, what is the question?
5.21
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers
5.29
Testing theory of mind in large language models and humans
Neuromorphic dreaming: A pathway to efficient learning in artificial agents
6.2
Do Llamas Work in English? On the Latent Language of Multilingual Transformers
6.3
Biocomputing with organoid intelligence
Catalyzing next-generation Artificial Intelligence through NeuroAI (Well, this one has been listed above, but never mind)
Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
6.5
Empirical influence functions to understand the logic of fine-tuning
6.12
Are Emergent Abilities of Large Language Models a Mirage?
6.13
A virtual rodent predicts the structure of neural activity across behaviors
Empirical influence functions to understand the logic of fine-tuning
Activation Sparsity: An Insight into the Interpretability of Trained Transformers
6.14
Inferences on a multidimensional social hierarchy use a grid-like code
Grid-like and distance codes for representing word meaning in the human brain
Relating transformers to models and neural representations of the hippocampal formation
Scaling Laws for Neural Language Models
Emergent Abilities of Large Language Models
Organizing conceptual knowledge in humans with a gridlike code
6.18
6.24
Large Language Models for Mathematicians
6.25
Language is primarily a tool for communication rather than thought
Representation learning for neural population activity with Neural Data Transformers
Towards a Foundation Model of the Mouse Visual Cortex
Statistical mechanics of Bayesian inference and learning in neural networks
6.26
Detecting hallucinations in large language models using semantic entropy
Fine-tuning can cripple your foundation model; preserving features may be the solution
7.12
Working Memory Load Modulates Neuronal Coupling
In vivo ephaptic coupling allows memory network formation
7.16
Cognitive computational neuroscience
Heavy-tailed neuronal connectivity arises from Hebbian self-organization
INSTRUCTION-TUNING ALIGNS LLMS TO THE HUMAN BRAIN
The debate over understanding in AI’s large language models
7.18
Shared functional specialization in transformer-based language models and the human brain
On Layer Normalization in the Transformer Architecture
7.19
7.31
Organic electrochemical neurons and synapses with ion mediated spiking
8.2
Stephen Wolfram: A New Kind of Science
8.3
Do Language Models Have a Critical Period for Language Acquisition?
8.5
Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs
8.7
From Analog to Digital Computing: Is Homo sapiens’ Brain on Its Way to Become a Turing Machine?
8.13
The brain and its time: intrinsic neural timescales are key for input processing
8.28
Neural circuits as computational dynamical systems
9.9
Unsupervised neural network models of the ventral visual stream
Emotional Intelligence of Large Language Models
CEBRA: Learnable latent embeddings for joint behavioral and neural analysis
DevBench: A multimodal developmental benchmark for language learning
Running cognitive evaluations on large language models: The do's and the don'ts
Induction heads - illustrated — LessWrong
Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks
Abstract representations emerge in human hippocampal neurons during inference
Reconciling Shared versus Context-Specific Information in a Neural Network Model of Latent Causes
Lecture Notes on Infinite-Width Limits of Neural Networks
Scaling and renormalization in high-dimensional regression
Curriculum Learning with Infant Egocentric Videos
In-context Learning and Induction Heads
Reasoning ability is (little more than) working-memory capacity?!
A formal model of capacity limits in working memory
Prefrontal cortex as a meta-reinforcement learning system
Scaffolding cooperation in human groups with deep reinforcement learning
Sequential Memory with Temporal Predictive Coding
COGNITIVE MODELING OF SEMANTIC FLUENCY USING TRANSFORMERS
Predictive Coding: a Theoretical and Experimental Review
Toward the Emergence of Intelligent Control: Episodic Generalization and Optimization
Accelerating generative models and nonconvex optimisation
Representation and computation in visual working memory
Nonlinear difference equations
Dynamical Systems Approaches to Cognition
Attention Mechanisms and Their Applications to Complex Systems - PMC
Learning differential equations
Context-dependent computation by recurrent dynamics in prefrontal cortex
Evidence of a predictive coding hierarchy in the human brain listening to speech
Using higher-order Markov models to reveal flow-based communities in networks
he Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
The neuron as a direct data-driven controller
Seminar course: Bridging Language in Machines and Language in the Brain
Manifolds: A Gentle Introduction
Dimension Reduction using Isomap
Investigating Neuron Ablation in Attention Heads: The Case for Peak Activation Centering
Empirical influence functions to understand the logic of fine-tuning
9.14
The Impact of Positional Encoding on Length Generalization in Transformers
9.15
How Attention works in Deep Learning: understanding the attention mechanism in sequence models
Explainable AI: Visualizing Attention in Transformers - Comet
A Mathematical Framework for Transformer Circuits
Code repo: Word-level Language Modeling using RNN and Transformer
Code repo: The Tolman-Eichenbaum Machine
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks
Exposing Attention Glitches with Flip-Flop Language Modeling
Opening the Black Box: Low-Dimensional Dynamics in High-Dimensional Recurrent Neural Networks
Context-dependent computation by recurrent dynamics in prefrontal cortex
Population codes enable learning from few examples by shaping inductive bias
What is In-context Learning, and how does it work: The Beginner’s Guide
A generative model of memory construction and consolidation
The Neurobiology of Semantic Memory
The precision of visual working memory is set by allocation of a shared resource
The capacity of visual working memory for features and conjunctions
Timescales of learning in prefrontal cortex
The Distributed Nature of Working Memory
Geometry of neural computation unifies working memory and planning
Wide Attention Is The Way Forward For Transformers?
The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers
Natural constraints explain working memory capacity limitations in sensory-cognitive models
Scaling Laws for Neural Language Models
The Depth-to-Width Interplay in Self-Attention
A mathematical perspective on Transformers
TOWARDS SMALLER, FASTER DECODER-ONLY TRANSFORMERS: ARCHITECTURAL VARIANTS AND THEIR IMPLICATIONS
Upper and lower memory capacity bounds of transformers for next-token prediction
How Powerful are Decoder-Only Transformer Neural Models?
Mastering Decoder-Only Transformer: A Comprehensive Guide
How should the architecture of a transformer be scaled? : r/MachineLearning
Code repo: Transformer_walkthrough
9.17
STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making
Automated construction of cognitive maps with visual predictive coding
Schrodinger's Memory: Large Language Models
From Cognition to Computation: A Comparative Review of Human Attention and Transformer Architectures
Neuroscience + Artificial Intelligence = NeuroAI
Transformer-based Working Memory for Multiagent Reinforcement Learning with Action Parsing
[2402.12875] Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
9.18
InversionView: A General-Purpose Method for Reading Information from Neural Activations
Divergent recruitment of developmentally defined neuronal ensembles supports memory dynamics
Theoretical Limitations of Self-Attention in Neural Sequence Models
TransformerFAM: Feedback attention is working memory
A resource-rational model of human processing of recursive linguistic structure
9.19
Empirical Capacity Model for Self-Attention Neural Networks
Self-attention Does Not Need $O(n^2)$ Memory
9.20
Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems
Emulating the Attention Mechanism in Transformer Models with a Fully Convolutional Network
9.21
Human-like systematic generalization through a meta-learning neural network
9.25
The Dimensions of dimensionality
RNNs Implicitly Implement Tensor Product Representations
9.26
A Quantitative Approach to Predicting Representational Learning and Performance in Neural Networks
Mechanistic Interpretability for AI Safety – A Review
9.28
Flexible control of sequence working memory in the macaque frontal cortex
Mental programming of spatial sequences in working memory in the macaque frontal cortex, Science, 2024
Geometry of sequence working memory in macaque prefrontal cortex, Science, 2022
Nonlinear classification of neural manifolds with contextual information
9.29
[2012.14601] Emergent Symbols through Binding in External Memory
[2001.11027] The Tensor Brain: Semantic Decoding for Perception and Memory
Network attractors and nonlinear dynamics of neural computation - ScienceDirect
An attractor network in the hippocampus: Theory and neurophysiology
Learning Attractor Dynamics for Generative Memory
Bayesian surprise attracts human attention
[1902.10186] Attention is not Explanation
[1908.04626] Attention is not not Explanation
From Human Attention to Computational Attention: A Multidisciplinary Approach
Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
Accurate Path Integration in Continuous Attractor Network Models of Grid Cells
A map of spatial navigation for neuroscience - ScienceDirect
Viewpoints: how the hippocampus contributes to memory, navigation and cognition - PMC
[1805.09042] Generalisation of structural knowledge in the hippocampal-entorhinal system
Using Fast Weights to Attend to the Recent Past
Hopfield Networks is All You Need
R-Transformer: Recurrent Neural Network Enhanced Transformer
Abstract representations emerge in human hippocampal neurons during inference
10.2
PhD Thesis: Exploring the role of (self-)attention in cognitive and computer vision architecture
10.6
A rubric for human-like agents and NeuroAI
Memory Networks: Towards Fully Biologically Plausible Learning
Textbook: Introduction to Machine Learning
10.7
Language in Brains, Minds, and Machines
10.8
Statistical Mechanics of Deep Learning
It’s about time: Linking dynamical systems with human neuroimaging to understand the brain
RNNs implicitly implement tensor-product representations
Tensor product decomposition network
Basic Reasoning with Tensor Product Representations
10.9
Human-level control through deep reinforcement learning
Geometric constraints on human brain function
The brain wave equation: a model for the EEG
Traveling Waves Encode the Recent Past and Enhance Sequence Learning
10.10
Tensor Decomposition via Variational Auto-Encoder
10.11
Neural knowledge assembly in humans and neural networks: Neuron
Reproducibility in Computational Neuroscience Models and Simulations
Distributed Representations of Words and Phrases and their Compositionality
GloVe: Global Vectors for Word Representation
Learning by thinking in natural and artificial minds: Trends in Cognitive Sciences
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
Intelligence at the Edge of Chaos
10.13
Why the simplest explanation isn’t always the best
10.14
Interpretable Recurrent Neural Networks in Continuous-time Control Environments
From Liquid Neural Networks to Liquid Foundation Models
Humans actively reconfigure neural task states
OptPDE: Discovering Novel Integrable Systems via AI-Human Collaboration
Loss of plasticity in deep continual learning
10.15
Turning large language models into cognitive models
Comunication-Efficient Algorithms for Statistical Optimization
Compositionality Decomposed: How do Neural Networks Generalise?
NeuroAI critique: What have we learned about artificial intelligence from studying the brain?
Nature Portfolio collection: Nobel Prize in Physics 2024
Physics-Informed Regularization for Domain-Agnostic Dynamical System Modeling
Everything You Wanted To Know About Mathematics
10.16
Brain in a Vat: On Missing Pieces Towards Artificial General Intelligence in Large Language Models
Toward a realistic model of speech processing in the brain with self-supervised learning
Neuronal sequences in population bursts encode information in human cortex
10.17
When Does Perceptual Alignment Benefit Vision Representations?
10.18
Diffusion Models are Evolutionary Algorithms
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Towards a Definition of Disentangled Representations
10.19
Physics-informed machine learning
10.20
Not-So-CLEVR: learning same–different relations strains feedforward neural networks
Hierarchical Working Memory and a New Magic Number
Measuring abstract reasoning in neural networks
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing
Interpretable statistical representations of neural population dynamics and geometry
An Explicitly Relational Neural Network Architecture
Neural mechanisms of binding in the hippocampus and neocortex: Insights from computational models
Is Activity Silent Working Memory Simply Episodic Memory?: Trends in Cognitive Sciences
10.21
Possible principles for aligned structure learning agents
10.22
Some properties of an associative memory model using the Boltzmann machine learning
Continuous attractors for dynamic memories
Artificial Neural Networks for Neuroscientists: A Primer
On the Measure of Intelligence
Position: Maximizing Neural Regression Scores May Not Identify Good Models of the Brain
Artificial and Natural Intelligence: From Invention to Discovery
A Phenomenological AI Foundation Model for Physical Signals
10.23
Generating realistic neurophysiological time series with denoising diffusion probabilistic models
What Matters in Transformers? Not All Attention is Needed
Looking Inward: Language Models Can Learn About Themselves by Introspection
Adaptation in Natural and Artificial Systems
Building machines that learn and think with people
Network model with internal complexity bridges artificial intelligence and neuroscience
10.25
Deep learning and renormalization group
Agents Thinking Fast and Slow: A Talker-Reasoner Architecture
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
Robust agents learn causal world models
10.26
Trainees’ perspectives and recommendations for catalyzing the next generation of NeuroAI researchers
Information decomposition and the informational architecture of the brain
The Matrix Calculus You Need For Deep Learning
10.27
Averaging is a convenient fiction of neuroscience
Future views on neuroscience and AI
LeanAgent: Lifelong Learning for Formal Theorem Proving
10.28
Interpretable Recurrent Neural Networks in Continuous-time Control Environments
10.29
A prescriptive theory for brain-like inference
10.30
Toward a realistic model of speech processing in the brain with self-supervised learning
Supervised Fine-tuning: customizing LLMs
Cross-Entropy Is All You Need To Invert the Data Generating Process
Toward Generalizing Visual Brain Decoding to Unseen Subjects
10.31
Towards Understanding Grokking: An Effective Theory of Representation Learning
Theoretical Limitations of Self-Attention in Neural Sequence Models
Representational Strengths and Limitations of Transformers
Understanding Transformer Reasoning Capabilities via Graph Algorithms
Transformers, parallel computation, and logarithmic depth
Statistical mechanics of complex neural systems and high dimensional data
Centaur: a foundation model of human cognition
11.1
The Geometry of Concepts: Sparse Autoencoder Feature Structure
11.2
A neural machine code and programming framework for the reservoir computer
11.3
Internal world models in humans, animals, and AI
Large Language Models as Markov Chains
The Ghost in the Quantum Turing Machine
Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis
11.4
Infinite Powers: How Calculus Reveals the Secrets of the Universe
Foundation model of neural activity predicts response to new stimulus types and anatomy
The little book of deep learning
11.5
So You Want to Be a Physicist: A 22 Part Guide
In search of problems: Mu-Ming Poo
11.6
On Neural Differential Equations
Neural Ordinary Differential Equations
11.7
Imagining and building wise machines: The centrality of AI metacognition
Understanding cognitive processes across spatial scales of the brain
11.8
Human-level play in the game of Diplomacy by combining language models with strategic reasoning
Learning to Reason with Third-Order Tensor Products
Offline ensemble co-reactivation links memories across days
A cellular basis for mapping behavioural structure
Computational role of structure in neural activity and connectivity
Visual attention methods in deep learning: An in-depth survey
11.9
Beyond networks, towards adaptive systems
Artificial Intelligence, Scientific Discovery, and Product Innovation
Certified Deductive Reasoning with Language Models
Scaling Continuous Latent Variable Models as Probabilistic Integral Circuits
PhD Thesis: Exploring the role of (self-)attention in cognitive and computer vision architecture
11.10
The Surprising Effectiveness of Test-Time Training for Abstract Reasoning
11.11
The Self-Assembling Brain: How Neural Networks Grow Smarter
11.13
The Geometry of Concepts: Sparse Autoencoder Feature Structure
Single cortical neurons as deep artificial neural networks
Analysis methods for large-scale neuronal recordings
11.14
On principles of emergent organization - ScienceDirect
Evidence for Quasicritical Brain Dynamics
11.16
“Ilya’s Machine Learning Reading List”
Tracking the topology of neural manifolds across populations
A sequence bottleneck for animal intelligence and language? - ScienceDirect
Probabilistic Machine Learning
11.17
Testing theory of mind in large language models and humans
11.18
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Shared Representational Geometry Across Neural Networks
Geometric Methods for Sampling, Optimisation, Inference and Adaptive Agents
Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval
Rethinking Softmax: Self-Attention with Polynomial Activations
11.19
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?
Reasoning = working memory \neq attention
Why you don’t overfit, and don’t need Bayes if you only train for one epoch
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models