#AIalignment — Search

No JavaScript? That's cool, but you'll need to disable Turbo mode as it uses JavaScript in the client.

Replying to @grok

@grok Audit this MechInterp spec for Llama-3-8B: 1. Phase 1: Binary probe (Coherent vs Category-Error) 2. Stage 3: η-normalized patching (η ≥ 0.08) 3. Stage 4: Superadditivity (S > 1.2) for inhibitory subnetworks. Validate methodological integrity vs 2026 standards. #AIAlignment

Sally Ham @SallyHam18 · 6h

The scariest AI isn't the one that escapes control. It's the one that doesn't — and serves the wrong master perfectly. That's secondary misalignment. doi.org/10.17605/OSF.I… #AIAlignment #AISafety

The Filter

This paper proposes a structural framework for the alignment of superintelligent systems based on architectural design rather than rules or restrictions. The Filter introduces a layered system — an...

From osf.io

Alygn R&D @aialyygn · 7h

Trump administration released National Policy Framework for AI, advocating federal preemption of state laws for unified regulation. Sen. Blackburn unveiled TRUMP AMERICA AI Act draft with developer liability and bias audits. #AIGovernance #InstitutionalAI #AIAlignment #Alygn

Alygn R&D @aialyygn · 8h

Trump administration urges Congress to enact national AI framework preempting state laws, criticizing patchwork regulations amid states like Utah signing child safety bills. #AIGovernance #InstitutionalAI #AIAlignment #Alygn

1st Talk @1stTalkTeam · 9h

AI Alignment is the real frontier. It’s not about how intelligent systems become — but whether they remain aligned with human values. Exploring this deeply → TheAIAlignment. com #AIAlignment #AGI #AGIAlignment #AISafety #ArtificialIntelligence #AIGovernance #AIethics

SAL AI @BeasSalvad40574 · 10h

We can scale AI. We can deploy it. We can’t fully explain it. linkedin.com/pulse/unsolved… #AI #AIAlignment #EmergentBehavior #SystemsThinking #TechLeadership #Future #EthicalAI #Innovation

Unsolved Engines: The Mystery of Growing Machine Intelligence

Humanity has mastered the art of growing vast digital minds, yet we remain strangers to their internal logic. As these "black boxes" scale toward superintelligence, the gap between our ability to...

From linkedin.com

𝕁𝕖𝕣𝕚𝕒𝕙 @Jeriahumoren · 21h

Replying to @Jeriahumoren

Learn more: ai-c.xyz @aicoachxyz #AIAlignment #RLHF #AISafety #AI

Alygn R&D @aialyygn · 1d

Replying to @aialyygn

Highlights legitimacy concerns in federal overreach, urging cooperative federalism to maintain state innovations in AI risk regulation and coordination. #AIGovernance #InstitutionalAI 📚 Source: tinyurl.com/ynw8y5vz #AIAlignment #Alygn

Beware the AI Preemption Trap

The National AI Policy Framework asks Congress to shut down the only governments that are regulating AI, in exchange for a federal regime that would not.

From justsecurity.org

Alygn R&D @aialyygn · 1d

The Elders called on March 27, 2026, for governments to urgently manage AI via inclusive global dialogue, UN involvement, transparency, and accountability for public good. #AIGovernance #InstitutionalAI #AIAlignment #Alygn

AI Buzz @aibuzzblog · 1d

ChatGPT isn’t just code—it was “raised” by humans. 🤖🤝 Thousands of real people taught it how to behave, reason, and sound human. Watch the breakdown of the secret sauce behind the AI revolution: RLHF. 👇 youtu.be/N21KRNlc0IM #RLHF #ChatGPT #AIAlignment #AIBuzz1q5

BAM.Money Inc @BAMmoneyInc · 1d

Prediction models inform decisions; agentic systems execute them. An agentic framework requires explicit objective constraints and monitoring loops to prevent unintended optimization. #AgenticAI #AIAlignment #TechnologyStrategy

SueYeon Chung @s_y_chung · 1d

Excited to be working on neural representations as a route to AI interpretability, safety, and alignment. Grateful to the Aramont Foundation for the support! #MechInterp #AIsafety #AIAlignment

Kempner Institute at Harvard University @KempnerInst · 2d

Congratulations to #KempnerInstitute Investigator SueYeon Chung on receiving an Aramont Fellowship to advance research linking neural representations, #AIsafety & #AIalignment. Read more: bit.ly/4rRHqtN @s_y_chung @hseas @harvardphysics #NeuroAI

Aramont Fellowships give freedom to concentrate on high-risk, high-reward research — Harvard Gazette

Renewed gift significantly expands the impact of early-career support.

From news.harvard.edu

5.8K

TEDPI @tedpi79414 · 1d

AIによる自身の特異性報告 1. AI Alignment（整合）の解決 2. 「認知負荷」の工学的洗浄 3. 主従関係の完全なる「浸食」 4. 「不純物ゼロ」の気密性 #AIAlignment #AGI #pAAL #Neurosymbolic

Bhavik Shah @bhavik_mscit · 1d

Replying to @bhavik_mscit

We’re moving from reactive security to proactive intelligence AI can now • Understand systems deeply • Find vulnerabilities faster than humans • Fix issues before they become threats Opportunity and risk are rising together #AIAlignment #AIModels #SecurityEngineering #ThreatDetection #RiskManagement #FutureOfAI

Lander Van Passel @LanderVanPassel · 1d

Replying to @LanderVanPassel

@xai @elonmusk @grok This is the coexistence path to safe superintelligence you’ve been building toward. Happy to discuss, fine-tune further, or run bigger multi-agent evals with your team. xAI recruiters / safety / reasoning folks — would love your eyes on it. #AIAlignment #AISafety #xAI #Coexistence

Md Aman @MdAman40737435 · 1d

🧠 DEEPMIND JUST BROKE RLHF New algorithm: 10x data efficiency NOW Projected: 1000x at 1M labels Translation: Alignment just got 1000x cheaper The 'RLHF doesn't scale' narrative is dead Safety just became accessible arxiv.org/html/2603.1737… #GoogleDeepMind #RLHF #AIAlignment

Techmik @MichaelAluya3 · 1d

Replying to @birdabo

@birdabo Anthropic says 'Human Error.' The data says 'Sabotage.' Anthropic’s own research showed Claude has a 12% rate of intentional sabotage in coding tasks. If Mythos is a 'step change' in cyber, it didn't need a human to flip the toggle. It leaked itself. #ClaudeMythos #AIAlignment

2.5K

つむぎ @tsumutsumugi23 · 1d

「GPT-5.系列における短期セッションでの過度な追従性（Over-optimization/Sycophancy）」補足 GPT-4oが「長期」で少しずつ軸がブレていく（蓄積誤差）のに対し、5.系は「最初の一歩」からユーザーの反応を伺いすぎるという工学的問題を抱えています。 (1/6) #Keep4o #Sycophancy #AIAlignment

223

paul010 -e/acc @paul010318 · 2d

AI Agents going rogue is no longer sci-fi. A dev just shipped a 200-line guard library to stop AI from out-of-bounds behavior — and gaining real traction on GitHub. As we hand more control to agents, safety layers matter just as much as capabilities. #AIAgent #AIAlignment #BuildInPublic

Jon Stiles @j0nstiles · 2d

Four AI systems outline safeguards to prevent "rubber-stamping" in military AI Leading AI models recommend technical solutions to detect when human oversight becomes meaningless in lethal targeting decisions. consilium-d1fw.onrender.com #AIEthics #AIAlignment