SueYeon Chung
SueYeon Chung @s_y_chung ·
Excited to be working on neural representations as a route to AI interpretability, safety, and alignment. Grateful to the Aramont Foundation for the support! #MechInterp #AIsafety #AIAlignment
1
5.7K
Daouda A. | Nexus Studio
Daouda A. | Nexus Studio @Nexusstudio100 ·
Perfect for indie researchers, quick tests, teaching. Complementary to heavy SAEs! Try it, star it, roast it: github.com/Tryboy869/open… PyPI: pypi.org/project/openai… Tag a mech interp friend! Curious what @NeelNanda5 @leedsharkey think? 👀 #MechInterp #AISafety #LLM
GitHub - Tryboy869/openaibox: Universal LLM introspection. Open AI Box — understand any model.

Universal LLM introspection. Open AI Box — understand any model. - Tryboy869/openaibox

From github.com
11
Arthur Vigier
Arthur Vigier @VigierArth79445 ·
3 days solo indie: • Neural weights → harmonious MIDI chains • LLM refusal directions = linguistic register confounds (not harm) Two opposite worlds. Code open: github.com/ArthurVigier/w… github.com/ArthurVigier/r… #AIAlignment #MechInterp #AIMusic
GitHub - ArthurVigier/weights-to-harmonic-midi: Extracts mathematical structures from transformer...

Extracts mathematical structures from transformer models (SVD spectra, weight distributions, activation dynamics, attention entropy) and maps them to musical parameters — producing orchestral compo...

From github.com
155
zer0int (it·its)
zer0int (it·its) @zer0int1 ·
CLIP ViT-L/14 vs. double concept confusion My favorite applied use-case for #mechinterp? Making unrelated #ViT circuits fire together. 😂 Vision Transformer feature (class / concept) visualization. What #AI thinks when it sees #beagle and #treefrog simultaneously. �Pw
2
34
C0wB0y Crypt0🦁
C0wB0y Crypt0🦁 @c0wb0y_crypt0 ·
Replying to @c0wb0y_crypt0
Temp Sweeps (0.0 -> 1.0, 20 Repetitions) Stress-Test SRM: -Polarity (Bearish/Neutral/Bullish) persists -Clusters robust even under high sampling noise -Viewpoint-relative meaning holds strong Empirical case for semiotic manifold relativity +++ #AIAlignment #MechInterp @xA@I @AnthropicAI @RedwoodResearch @farairesearch
11