Activation Steering for Discrete Language Diffusion Models

Venue: TUM

Location: Munich, Germany

Date: August 19, 2025

Thumbnail

Project Background

Diffusion models have recently emerged as a powerful paradigm for generative modeling, achieving state-of-the-art results in continuous domains such as images and audio. In the discrete domain, masked language diffusion models aim to generate coherent sequences of tokens through iterative denoising steps, offering a compelling alternative to autoregressive models.

Despite their promise, discrete language diffusion models face several challenges:

  • Mode collapse or repetitive token sequences
  • Lack of controllability over style, tone, or semantic traits
  • Sensitivity to sampling strategies

Activation steering, which has shown success in transformers for controlling textual traits post-training, could provide a novel mechanism to guide discrete diffusion processes. By identifying directions in the model’s hidden activations corresponding to specific traits (e.g., formality, sentiment, or domain-specific style), it is possible to modulate generated sequences without retraining the model.

Applying activation steering to discrete language diffusion models could allow for more controllable, coherent, and high-quality text generation, making this a promising avenue for research. Extending on the fine grained control: Applying activation steering to masked language diffusion models allows to steer only selected tokens allowing for a fine grained level of control not acchievable with decoder only LLMs.

Your Tasks

  • Literature Review: Survey discrete language diffusion methods, activation steering techniques, and related controllable generation approaches.
  • Experimental Design: Develop evaluation frameworks for measuring control, coherence, and quality of text generated under activation steering.
  • Steering Vector Extraction: Implement methods to identify activation directions corresponding to semantic or stylistic traits in diffusion models and evaluate the superposition of these sterring vectors.
  • Evaluation: Benchmark performance using automatic metrics (perplexity, BLEU, diversity) and human evaluation to assess controllability and fluency.

What We Offer

  • Opportunity to work on a cutting-edge generative AI project with publication potential.
  • Freedom to explore novel applications of activation steering in discrete generative models.
  • Close supervision and collaboration opportunities with experts in NLP and machine learning.
  • Access to computational resources including:
    • A dedicated cluster with multiple A40 and A100 GPUs
    • Fast SSD-backed storage
    • Tools for efficient experiment management

Project Details

Duration: 6 months

Required Background:

  • Strong programming skills (Python/PyTorch)
  • Solid understanding of machine learning fundamentals
  • Interest in generative models and NLP
  • (Preferred) Experience with transformers, diffusion models, or controllable text generation
  • Ability to work independently and collaboratively in interdisciplinary teams

How to Apply

Please send a short motivation statement, your CV, and a recent transcript of records to:
johannes.kaiser@tum.de

References