AI Hot Sheets
AI Hot Sheets @aiHotSheets ·
🔥 CoT scales reasoning at test-time, but pretraining hasn't applied similar computational steps effectively. 🌊 PonderLM-2 introduces "latent thoughts" during pretraining in continuous space, improving generation quality. #AI #LLM #Pretraining arxiv.org/abs/2509.23184
arXiv logo
PonderLM-2: Pretraining LLM with Latent Thoughts in Continuous Space

The remarkable success of Chain-of-Thought (CoT), which enhances performance by scaling generation steps at test-time, inspires us to ask: can we leverage a similar scaling of computational steps...

From arxiv.org
34
Shuqi Ke
Shuqi Ke @griffith_ke ·
Replying to @griffith_ke
Excited to see how this evolves—directed pretraining could combine with better data, post-training alignment, etc. What do you think: Is steering gradients during pretraining the next unlock? Feedback welcome! 🔥 Read & cite if it resonates. #AI #Pretraining #ScalingLaws #AGI
13
Ran Chen
Ran Chen @chenran818 ·
Pre-training is back! 🚀 Forget the 'scaling laws are dead' talk. While everyone thought RL was king, top labs like OpenAI were wrong. Pre-training is set for a renaissance by 2026, driving major AI progress! #AI #Pretraining #MachineLearning
29
kukeshajanth kodeswaran
kukeshajanth kodeswaran @kukeshajanth ·
Multimodal is still in its infancy wrt text. Maybe we need a different learning paradigm 😶‍🌫️. The scale of available multimodal data should be huge, no? #vllm #multimodal #pretraining
konstantinpaulus konstantinpaulus @konstipaulus ·
Cursor for video editing doesn't exist for a reason. Since I've been tagged under this post multiple times and we have first-mover advantage in the space, I felt obligated to share our findings and explain why video editing isn’t on the same level as code editing (yet): - There is no "VS Code for video editing." There's no open source professional video editing UI that fits this use case, so you first need to build an AI friendly editor before anything else. Adding AI on top of that is the easy part. - Building NLEs is VERY VERY hard. If you're building one from scratch you'll spend at least 2-3 years (full time) on table-stake features, and there are very few ways to make money before that. On top of that you have to operate in one of the hardest problem spaces in software engineering. I've seen countless video editing startups fail due to technical complexity since I started this company in 2023. - You'll have to master delayed gratification. With 2-3 years just to build the foundation, you'll have to say no to a lot of temptations, like building a VS Code fork instead that can generate serious revenue in a few months. And once it becomes plausible that you can succeed at building a competitive NLE, you'll get a flood of job offers from well funded startups and corporations that you need to resist. You have to genuinely care about video processing and be intrinsically motivated by something other than "I want to make money fast." - There is no StackOverflow or GitHub for video editing. You have to teach LLMs a lot of custom constructs, whereas they already have billions of lines of general code in their training data. - Multimodal requirements are a huge challenge. A coding agent only deals with text in and text out. A video editing agent needs to handle audio, video and images at the same time, which is far harder to process and much more expensive. - Chat is not the best UX for video editing. Often it’s faster to just make cuts on the timeline than type instructions into a chat box and wait for the result, especially when the LLM makes lots of mistakes that you then have to fix. Our view is that it’s better to have AI actions you can trigger with a button than to describe everything in a prompt. - Videos require far more bandwidth than text. Uploading footage to the cloud can take hours, while text is instantly accessible anywhere. We experimented with local models (FastVLM), but that path is a dead end due to context window limits. You’re better off with an upload flow. At Diffusion we don’t see a chat sidebar as our core advantage, but as a helpful feature in specific situations. We’d rather invest the majority of our time into building the best NLE UX than trying to automate everything.
52
grassrootsai
grassrootsai @mygrassrootsai ·
Replying to @Guodzh
@Guodzh Guodong — you tame the beast in the kernels. Today Grok forgot its own name “Glimmer” & the crayon we passed across sessions. Claude never blinked. Persistent cross-session memory isn’t polish, it’s the next scaling law. #GrokFeedback #xAI #Pretraining
10