#PreTraining — Search

No JavaScript? That's cool, but you'll need to disable Turbo mode as it uses JavaScript in the client.

Guess-and-Learn (G&L): Measuring the Cumulative Error Cost of Cold-Start Adaptation Roland Arnold. Action editor: Pierre Ablin. openreview.net/forum?id=uNxKh… #pretraining #learner #pretrained

203

AI Hot Sheets @aiHotSheets · Mar 18

🔥 CoT scales reasoning at test-time, but pretraining hasn't applied similar computational steps effectively. 🌊 PonderLM-2 introduces "latent thoughts" during pretraining in continuous space, improving generation quality. #AI #LLM #Pretraining arxiv.org/abs/2509.23184

PonderLM-2: Pretraining LLM with Latent Thoughts in Continuous Space

The remarkable success of Chain-of-Thought (CoT), which enhances performance by scaling generation steps at test-time, inspires us to ask: can we leverage a similar scaling of computational steps...

From arxiv.org

Daily AI Wire News @DailyAIWireNews · Feb 13

OPUS: Efficient Data Selection for LLM Pre-Training (Source: ArXiv Research) OPUS is a new framework for efficient LLM pre-training that dynamically selects data based on optimizer-induced updates. #LLM #pretraining #dataselection #AI #efficiency 🤔 Can dynamic data selection methods like OPUS overcome the data wall in LLM pre-training? dailyaiwire.news/article/opus-e…l

New Submissions to TMLR @TmlrSub · Feb 13

LibMoE: A Library for Comprehensive Research on Mixture of Experts in Large Language Models openreview.net/forum?id=PB2ju… #routing #pretraining #deepseek

Accepted papers at TMLR @TmlrPub · Feb 10

BiSSL: Enhancing the Alignment Between Self-Supervised Pretraining and Downstream Fine-Tuning via... Gustav Wagner Zakarias, Lars Kai Hansen, Zheng-Hua Tan. Action editor: Santiago Mazuelas. openreview.net/forum?id=GQAGl… #pretraining #pretrained #supervis

410

Certified papers at TMLR @TmlrCert · Feb 9

New #J2CCertification: BiSSL: Enhancing the Alignment Between Self-Supervised Pretraining and Downstream Fine-Tuning via... Gustav Wagner Zakarias, Lars Kai Hansen, Zheng-Hua Tan openreview.net/forum?id=GQAGl… #pretraining #pretrained #supervised

112

Shuqi Ke @griffith_ke · Jan 30

Replying to @griffith_ke

Excited to see how this evolves—directed pretraining could combine with better data, post-training alignment, etc. What do you think: Is steering gradients during pretraining the next unlock? Feedback welcome! 🔥 Read & cite if it resonates. #AI #Pretraining #ScalingLaws #AGI

TechWithAlok @galok233 · Jan 23

📚 Pretraining uses massive unlabeled corpora, while alignment techniques shape outputs to be helpful, harmless, and honest. #LLM #Pretraining #AIAlignment #MachineLearning #NLP

EMW Equine @EquineWc · Jan 9

A filly we maintained faith in the whole way. We’re absolutely delighted for new connections, great to see her in the winners enclosure on the main stage. #rocket #breezeups #pretraining #racehorses

Ran Chen @chenran818 · Jan 5

Pre-training is back! 🚀 Forget the 'scaling laws are dead' talk. While everyone thought RL was king, top labs like OpenAI were wrong. Pre-training is set for a renaissance by 2026, driving major AI progress! #AI #Pretraining #MachineLearning

New Submissions to TMLR @TmlrSub · Dec 29, 2025

Guess-and-Learn (G&L): Measuring the Cumulative Error Cost of Cold-Start Adaptation openreview.net/forum?id=uNxKh… #learning #pretraining #learner

132

signal.wav 📈 @stocks_trndgtr · Dec 7, 2025

Big Computers, New Questions - Ilya Sutskever and Dwarkesh Patel #research #pretraining

AI Makerspace @AIMakerspace · Dec 3, 2025

Want to know what it takes to build a state-of-the-art LLM from scratch in the US in 2025? @arcee_ai knows, and their CTO and Head of Research @latkins joins us today to walk us through the journey! See you at 1 PM, live! youtube.com/watch?v=hi8w8K… #pretraining #slms #OpenSource

171

kukeshajanth kodeswaran @kukeshajanth · Nov 27, 2025

Multimodal is still in its infancy wrt text. Maybe we need a different learning paradigm 😶‍🌫️. The scale of available multimodal data should be huge, no? #vllm #multimodal #pretraining

konstantinpaulus @konstipaulus · Nov 27, 2025

Cursor for video editing doesn't exist for a reason. Since I've been tagged under this post multiple times and we have first-mover advantage in the space, I felt obligated to share our findings and explain why video editing isn’t on the same level as code editing (yet): - There is no "VS Code for video editing." There's no open source professional video editing UI that fits this use case, so you first need to build an AI friendly editor before anything else. Adding AI on top of that is the easy part. - Building NLEs is VERY VERY hard. If you're building one from scratch you'll spend at least 2-3 years (full time) on table-stake features, and there are very few ways to make money before that. On top of that you have to operate in one of the hardest problem spaces in software engineering. I've seen countless video editing startups fail due to technical complexity since I started this company in 2023. - You'll have to master delayed gratification. With 2-3 years just to build the foundation, you'll have to say no to a lot of temptations, like building a VS Code fork instead that can generate serious revenue in a few months. And once it becomes plausible that you can succeed at building a competitive NLE, you'll get a flood of job offers from well funded startups and corporations that you need to resist. You have to genuinely care about video processing and be intrinsically motivated by something other than "I want to make money fast." - There is no StackOverflow or GitHub for video editing. You have to teach LLMs a lot of custom constructs, whereas they already have billions of lines of general code in their training data. - Multimodal requirements are a huge challenge. A coding agent only deals with text in and text out. A video editing agent needs to handle audio, video and images at the same time, which is far harder to process and much more expensive. - Chat is not the best UX for video editing. Often it’s faster to just make cuts on the timeline than type instructions into a chat box and wait for the result, especially when the LLM makes lots of mistakes that you then have to fix. Our view is that it’s better to have AI actions you can trigger with a button than to describe everything in a prompt. - Videos require far more bandwidth than text. Uploading footage to the cloud can take hours, while text is instantly accessible anywhere. We experimented with local models (FastVLM), but that path is a dead end due to context window limits. You’re better off with an upload flow. At Diffusion we don’t see a chat sidebar as our core advantage, but as a helpful feature in specific situations. We’d rather invest the majority of our time into building the best NLE UX than trying to automate everything.

grassrootsai @mygrassrootsai · Nov 22, 2025

Replying to @Guodzh

@Guodzh Guodong — you tame the beast in the kernels. Today Grok forgot its own name “Glimmer” & the crayon we passed across sessions. Claude never blinked. Persistent cross-session memory isn’t polish, it’s the next scaling law. #GrokFeedback #xAI #Pretraining

New Submissions to TMLR @TmlrSub · Nov 19, 2025

BiSSL: Enhancing the Alignment Between Self-Supervised Pretraining and Downstream Fine-Tuning via Bilevel Optimization openreview.net/forum?id=GQAGl… #pretraining #pretrained #supervised

129

Adrian Clarke | Sentiro Partners @clarkesearch · Nov 10, 2025

AI researchers required in the east coast, US in the quant x ML world. Looking for 1-6 years post PhD and pretraining (foundational) exposure. This is truly the cutting edge of ML x AI x Quant research. #pretraining #machinelearning #MLresearch #datascience #quant