Publications
Research contributions to geometric deep learning and mechanistic interpretability
RT-TopKSAE: Improving Top-k Sparse Autoencoders with the Rotation Trick
Authors: Sulayman Yusuf, A. Balwani
Venue: ICLR 2026 @ Geometric Representations and Mechanisms (GRaM) Workshop
February 2026
Abstract
We present RT-TopKSAE, a novel approach to improving sparse autoencoders by incorporating the rotation trick from geometric deep learning. Our method addresses the challenge of preserving principal components in high-dimensional latent spaces while maintaining sparsity constraints. By applying rotation-equivariant transformations during training, we achieve a 40% improvement in principal component retention compared to standard Top-k sparse autoencoders, while maintaining comparable sparsity levels. Our approach uses custom PyTorch autograd functions to preserve gradients through the rotation operations, enabling end-to-end training. We demonstrate the effectiveness of RT-TopKSAE on several benchmark tasks in mechanistic interpretability, showing improved feature disentanglement and interpretability of learned representations.
View BibTeX Citation▼
@inproceedings{yusuf2026rttopksae, title={RT-TopKSAE: Improving Top-k Sparse Autoencoders with the Rotation Trick}, author={Yusuf, Sulayman and Balwani, A.}, booktitle={ICLR 2026 Workshop on Geometric Representations and Mechanisms}, year={2026} }Research Interests
- ▸Geometric Deep Learning
- ▸Mechanistic Interpretability
- ▸Sparse Autoencoders
- ▸Equivariant Neural Networks
- ▸ML Training Infrastructure
- ▸Manifold Learning