Thang Luong
Thang Luong @lmthang ·
Replying to @lmthang
And before that 6 other math research papers by #Aletheia and other discoveries in physics and computer science, all of which were powered by Gemini #DeepThink! x.com/lmthang/status…
Thang Luong Thang Luong @lmthang ·
6 months in, after the IMO-gold achievement, I’m very excited to share another important milestone: AI can help accelerate knowledge discovery in mathematics, physics, and computer science! We’re sharing Two new papers from @GoogleDeepMind and @GoogleResearch that explore how #DeepThink together with agentic workflows can empower mathematicians and scientists to tackle professional research problems. Some highlights: The first paper built a research agent #Aletheia, powered by an advanced version of Gemini Deep Think, that can autonomously produce publishable math research and crack open Erdős problems. The second paper, built on similar agentic reasoning ideas, helped resolve bottlenecks in 18 research problems, across algorithms, ML and combinatorial optimization, information theory and economics. See the thread for details about the two papers and the joint blog post.
1
1.1K
Harman Singh
Harman Singh @Harman26Singh ·
Future directions from V1 / pairwise self-verification: 💡 Latency knob for #DeepThink-style systems by spending compute upfront on parallelizable pairwise verification 💡 Test-time scaling for agents: use or improve V1-Infer as a selection signal@(@xiaochuanlee) 💡 Reward-model-free RLVR via self-signals from pairwise comparisons 💡 Rubric-based RLVR in non-verifiable domains via V1-Infer-style ranking for rewards (@vijaytarian) 💡 Analyze how V1 shifts the generation vs verification compute frontier, and how RL-for-verification changes that curve (@nishadsinghi @hbXNov) Related work (links) below 👇
Harman Singh Harman Singh @Harman26Singh ·
Can LLMs Self-Verify? Much better than you'd expect. LLMs are increasingly used as parallel reasoners, sampling many solutions at once. Choosing the right answer is the real bottleneck. We show that pairwise self-verification is a powerful primitive. Introducing V1, a generation and self-verification: 💡 Pairwise self-verification beats pointwise scoring, improving test-time scaling 💡 V1-Infer: Efficient tournament-style ranking that improves self-verification 💡 V1-PairRL: RL training where generation and verification co-evolve for developing better self-verifiers 🧵👇
1
5
2.5K
Tu Vu
Tu Vu @tuvllms ·
Replying to @tuvllms
�#DeepThink improves reasoning by generating, refining, and aggregating a population of candidate solutions. Yet existing frameworks are monolithic pipelines, which limits systematic analysis. We introduce a taxonomy with three stages: creation, enhancement, and aggregation.�bq
1
972
Harsha
Harsha @harsha_software ·
Your VP of Architecture is a bottleneck, not an asset. Here’s what AI replaced in 30 days. #deepthink #google
30