Thoughts

2025

  1. Fact-checking good AI researchers

  2. The description-execution gap

  3. RL environment specs

  4. The inverse 80-20 rule

  5. Method-driven vs problem-driven research

  6. AI research is a max-performance domain

  7. ATLA is the ultimate benchmark

  8. Measurement is all you need

  9. AlphaEvolve is thought-provoking

  10. Binary-choice questions for AI research taste

  11. The craziest chain-of-thought

  12. The best hard-to-solve easy-to-verify benchmark

  13. Flavors of AI for scientific innovation

  14. Debugging-prioritized AI research

  15. When scientific understanding catches up with models

  16. Butterfly effect of AI researchers’ backgrounds

  17. Benchmarks quickly get saturated

  18. Deep browsing models

  19. Unstoppable RL optimization vs unhackable RL environment

  20. Dopamine cycle in AI research

  21. Find the right dataset

2024

  1. Biggest lessons in AI in past five years

  2. Solving hallucinations via self-calibration

  3. Cooking with AI mindset

  4. OpenAI o3

  5. Value of safety research

  6. RL all the time

  7. People who influenced me

  8. Transition to AI for science

  9. Information density & flow of papers

  10. CoT before and after o1

  11. SimpleQA

  12. The o1 paradigm

  13. Inspiring words from a young OpenAI engineer

  14. Levels and expectations

  15. Bet on AI research experiments

  16. History of Flan-2

  17. When I don’t sleep enough

  18. Thinking about history makes me appreciate AI

  19. Advice from Bryan Johnson

  20. Sora is like GPT-2 for video generation

  21. A typical day at OpenAI

  22. Yolo runs

  23. Uniform information density for CoT

  24. Inertia bias in AI research

  25. Compute-bound, not headcount-bound

  26. Magic of language models

  27. Why you should write tests

  28. Co-founders who still write code

2023

  1. Hyung Won

  2. Read informal write-ups

  3. Relationship board of directors

  4. Reinventing myself

  5. Good prompting techniques

  6. 10k citations

  7. Manually inspect data

  8. Language model evals

  9. Amusing nuggets from being an AI resident

  10. When to use task-specific models

  11. Benefits of pair programming

  12. Many great managers do IC work

  13. Why I’m 100% transparent with my manager

  14. My girlfriend is a reward model

  15. Better citation metrics than h-index

  16. My strengths are communication and prioritization

  17. Emergence (dunk on Yann LeCun)

  18. UX for researchers

  19. My refusal

  20. The evolution of prompt engineering

  21. Prompt engineering battle

  22. Incumbents don’t have a big advantage in AI research

  23. Potential research directions for PhD students

  24. Best AI skillset

2022

  1. Add an FAQ section to your research papers

  2. Prompt engineering is black magic

  3. What work withstands the bitter lesson

  4. A skill to unlearn

  5. Advice on choosing a topic