Interesting NeuroAI/CompNeuro/LLM Cognition/Embodied AI/miscellaneous papers

I was in Vancuver attending the NeurIPS 2024 conference, and it was pretty fun and I learned a lot. Below are some of the interesting works I came across during the conference.

[2402.14180] Linear Transformers are Versatile In-Context Learners

[2410.10781] When Attention Sink Emerges in Language Models: An Empirical View

[2305.13948] Decoupled Kullback-Leibler Divergence Loss

[2312.16045] Algebraic Positional Encodings

[2411.00205] Compositional Automata Embeddings for Goal-Conditioned Reinforcement Learning

[2411.01801] Bootstrapping Top-down Information for Self-modulating Slot Attention

[2405.18512] Understanding Transformer Reasoning Capabilities via Graph Algorithms

[2406.01766] How Does Gradient Descent Learn Features – A Local Analysis for Regularized Two-Layer Neural Networks

[2405.18548] Transformer Encoder Satisfiability: Complexity and Impact on Formal Reasoning

[2408.12186] Transformers are Minimax Optimal Nonparametric In-Context Learners

[2410.23856] Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?

[2406.01257] What makes unlearning hard and what to do about it

[2412.01019] Energy-Based Modelling for Discrete and Mixed Data via Heat Equations on Structured Spaces

[2406.08607] Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference

[2310.13220] Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

Deep Homomorphism Networks

[2402.17805] Graph Neural Networks and Arithmetic Circuits

[2209.07924] GNNInterpreter: A Probabilistic Generative Model-Level Explanation for Graph Neural Networks

[2405.04776] Chain of Thoughtlessness? An Analysis of CoT in Planning

GraphTrail: Translating GNN Predictions into Human-Interpretable Logical Rules

[2405.14606] Logical Characterizations of Recurrent Graph Neural Networks with Reals and Floats

[2406.02550] Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks

[2405.05409] Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing

Great Minds Think Alike: The Universal Convergence Trend of Input Salience

[2405.19562] Selective Explanations

[2409.03662] The representation landscape of few-shot learning and fine-tuning in large language models

[2406.04271] Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

[2407.13623] Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

[2311.14601] A Metalearned Neural Circuit for Nonparametric Bayesian Inference

[2410.17066] Neuronal Competition Groups with Supervised STDP for Spike-Based Classification

[2407.08751] Latent Diffusion for Neural Spiking Data

Exploring the trade-off between deep-learning and explainable models for brain-machine interfaces

[2402.15978] Shaving Weights with Occam’s Razor: Bayesian Sparsification for Neural Networks Using the Marginal Likelihood

[2409.13876] Physics-Informed Variational State-Space Gaussian Processes

[2402.19072] TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables

[2405.15124] Scaling Law for Time Series Forecasting

[2410.07166] Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making

[2405.14728] Intervention and Conditioning in Causal Bayesian Networks

[2408.16862] Probabilistic Decomposed Linear Dynamical Systems for Robust Discovery of Latent Neural Dynamics

The motion planning neural circuit in goal-directed navigation as Lie group operator search

[2406.18808] Binding in hippocampal-entorhinal circuits enables compositionality in cognitive maps

Structured flexibility in recurrent neural networks via neuromodulation

[2402.06025] Doing Experiments and Revising Rules with Natural Language and Probabilistic Reasoning

[2311.09308] Divergences between Language Models and Human Brains

A Unifying Normative Framework of Decision Confidence

Learning Place Cell Representations and Context-Dependent Remapping

[2410.23126] Provably Optimal Memory Capacity for Modern Hopfield Models: Transformer-Compatible Dense Associative Memories as Spherical Codes

[2402.03864] The Challenges of the Nonlinear Regime for Physics-Informed Neural Networks

[2306.09377] Evaluating alignment between humans and neural network representations in image-based learning tasks

[2411.02544] Pretrained transformer efficiently learns low-dimensional target functions in-context

[2411.03865] AdaSociety: An Adaptive Environment with Social Structures for Multi-Agent Decision-Making

[2405.17796] Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise Exploration-Exploitation Tradeoff

[2407.16975] On the Parameter Identifiability of Partially Observed Linear Causal Models

[2405.05254] You Only Cache Once: Decoder-Decoder Architectures for Language Models

[2406.07640] When is an Embedding Model More Promising than Another?

Learning Group Actions on Latent Representations

[2406.06484] Parallelizing Linear Transformers with the Delta Rule over Sequence Length

[2410.13821v1] Artificial Kuramoto Oscillatory Neurons

A call for intrinsic learning

NeurIPS 2024 Workshop NeuroAI

12.20

The unbearable slowness of being: Why do we live at 10 bits/s?

12.30

Chaotic recurrent neural networks for brain modelling: A review

1.6

Formal Mathematical Reasoning: A New Frontier in AI

NeuroTorch: A Python library for neuroscience-oriented machine learning

1.18

A neural network model of when to retrieve and encode episodic memories

[2210.13382] Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

1.23

[2310.10899] Instilling Inductive Biases with Subnetworks

1.25

[2403.15796] Understanding Emergent Abilities of Language Models from the Loss Perspective

1.29

[1803.03635] The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Horray! I’ve been updating this list for a year!

2.1

[2408.05798] Time Makes Space: Emergence of Place Fields in Networks Encoding Temporally Continuous Sensory Experiences

[Dendrites endow artificial neural networks with accurate, robust and parameter-efficient learning

Nature Communications](https://www.nature.com/articles/s41467-025-56297-9)

[2501.10465] The Mathematics of Artificial Intelligence

[2408.05798] Time Makes Space: Emergence of Place Fields in Networks Encoding Temporally Continuous Sensory Experiences

[2402.03300] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

2.3

Information decomposition and the informational architecture of the brain

Alright from today on the updates will be moved to the new repo NeuroAI-Readings. Hooraay!

Share on

Twitter Facebook LinkedIn

Dongyu Gong

Interesting NeuroAI/CompNeuro/LLM Cognition/Embodied AI/miscellaneous papers

Share on

You May Also Enjoy

Mind Traces

Random Thoughts

No Social Media For One Month

New year, new start!