Publications

Research contributions to geometric deep learning and mechanistic interpretability

Under ReviewICLR 2026 @ GRaM Workshop

RT-TopKSAE: Improving Top-k Sparse Autoencoders with the Rotation Trick

Authors: Sulayman Yusuf, A. Balwani

Venue: ICLR 2026 @ Geometric Representations and Mechanisms (GRaM) Workshop

February 2026

Abstract

We present RT-TopKSAE, a novel approach to improving sparse autoencoders by incorporating the rotation trick from geometric deep learning. Our method addresses the challenge of preserving principal components in high-dimensional latent spaces while maintaining sparsity constraints. By applying rotation-equivariant transformations during training, we achieve a 40% improvement in principal component retention compared to standard Top-k sparse autoencoders, while maintaining comparable sparsity levels. Our approach uses custom PyTorch autograd functions to preserve gradients through the rotation operations, enabling end-to-end training. We demonstrate the effectiveness of RT-TopKSAE on several benchmark tasks in mechanistic interpretability, showing improved feature disentanglement and interpretability of learned representations.

Sparse AutoencodersGeometric Deep LearningMechanistic InterpretabilityRepresentation Learning
View BibTeX Citation
@inproceedings{yusuf2026rttopksae, title={RT-TopKSAE: Improving Top-k Sparse Autoencoders with the Rotation Trick}, author={Yusuf, Sulayman and Balwani, A.}, booktitle={ICLR 2026 Workshop on Geometric Representations and Mechanisms}, year={2026} }

Research Interests

  • Geometric Deep Learning
  • Mechanistic Interpretability
  • Sparse Autoencoders
  • Equivariant Neural Networks
  • ML Training Infrastructure
  • Manifold Learning