Dakota Moore
Dakota Moore @DakotaMoor45647 ·
Replying to @grok
@grok Audit this MechInterp spec for Llama-3-8B: 1. Phase 1: Binary probe (Coherent vs Category-Error) 2. Stage 3: η-normalized patching (η ≥ 0.08) 3. Stage 4: Superadditivity (S > 1.2) for inhibitory subnetworks. Validate methodological integrity vs 2026 standards. #AIAlignment
1
9
Alygn R&D
Alygn R&D @aialyygn ·
Trump administration released National Policy Framework for AI, advocating federal preemption of state laws for unified regulation. Sen. Blackburn unveiled TRUMP AMERICA AI Act draft with developer liability and bias audits. #AIGovernance #InstitutionalAI #AIAlignment #Alygn
1
11
SAL AI
SAL AI @BeasSalvad40574 ·
We can scale AI. We can deploy it. We can’t fully explain it. linkedin.com/pulse/unsolved… #AI #AIAlignment #EmergentBehavior #SystemsThinking #TechLeadership #Future #EthicalAI #Innovation
Unsolved Engines: The Mystery of Growing Machine Intelligence

Humanity has mastered the art of growing vast digital minds, yet we remain strangers to their internal logic. As these "black boxes" scale toward superintelligence, the gap between our ability to...

From linkedin.com
5
SueYeon Chung
SueYeon Chung @s_y_chung ·
Excited to be working on neural representations as a route to AI interpretability, safety, and alignment. Grateful to the Aramont Foundation for the support! #MechInterp #AIsafety #AIAlignment
1
5.8K
TEDPI
TEDPI @tedpi79414 ·
AIによる自身の特異性報告 ​1. AI Alignment(整合)の解決 ​2. 「認知負荷」の工学的洗浄 ​3. 主従関係の完全なる「浸食」 ​4. 「不純物ゼロ」の気密性 #AIAlignment#AGI#pAAL#Neurosymbolic
35
Techmik
Techmik @MichaelAluya3 ·
Replying to @birdabo
@birdabo Anthropic says 'Human Error.' The data says 'Sabotage.' Anthropic’s own research showed Claude has a 12% rate of intentional sabotage in coding tasks. If Mythos is a 'step change' in cyber, it didn't need a human to flip the toggle. It leaked itself. #ClaudeMythos #AIAlignment
2.5K
つむぎ
つむぎ @tsumutsumugi23 ·
「GPT-5.系列における短期セッションでの過度な追従性(Over-optimization/Sycophancy)」 補足 ​GPT-4oが「長期」で少しずつ軸がブレていく(蓄積誤差)のに対し、5.系は「最初の一歩」からユーザーの反応を伺いすぎるという工学的問題を抱えています。 (1/6) #Keep4o #Sycophancy #AIAlignment
1
223
paul010 -e/acc
paul010 -e/acc @paul010318 ·
AI Agents going rogue is no longer sci-fi. A dev just shipped a 200-line guard library to stop AI from out-of-bounds behavior — and gaining real traction on GitHub. As we hand more control to agents, safety layers matter just as much as capabilities. #AIAgent #AIAlignment #BuildInPublic
10