Introduction to New LLMs & Their Core Philosophies Anthropic launched Opus 4.6, and OpenAI responded with GPT-5.3 Codex, sparking a debate on which model is superior for technical users. The host, Greg, interviews Morgan Linton, an experienced engineer, investor, and founder in AI, to provide tactical insights and a head-to-head comparison. The goal is to understand how to use the models, when to use them, and how to get started, rather than just "hot takes." Getting Started with Opus 4.6: Key Configurations & Features For Opus 4.6, the Anthropic team encourages use via the CLI (Command Line Interface), while GPT-5.3 Codex is showcased in the OpenAI desktop app on Mac. To ensure you're running Opus 4.6, perform an npm update or claude update; the current version should be 2.1.32 (not 1.x). Edit settings.json (located at ~/.claude/settings.json) to explicitly set the model to claude-opus-4-6 or simply opus if it's the newest. The crucial step for using Agent Teams in Opus 4.6 is to enable it as an experimental feature by adding "env": {"ClaudeCodeExperimentalAgentTeams": "1"} to your settings.json. For API users, Opus 4.6 introduces Adaptive Thinking, allowing users to set an effort level (e.g., max for no constraints on thinking depth), which is exclusive to 4.6 and will error on older models. To use split panes for agents (e.g., in Warp terminal), install tmux and update the settings.json to set displayMode to splitPanes. Philosophical Divergence: Codex vs. Opus GPT-5.3 Codex acts as an interactive collaborator, allowing users to steer it mid-execution, stay in the loop, and course-correct. Opus 4.6 emphasizes an autonomous, agentic, thoughtful system that plans deeply, runs longer, and requires less human intervention. This split reflects two engineering methodologies: tight human-in-loop control (Codex) vs. delegating whole chunks of work and reviewing results (Opus). Neither is inherently "better"; the choice depends on your preferred development methodology and personality type. Core Differences in Capabilities Context Window: Opus 4.6 boasts a 1 million token context window, excelling at coherence over entire documents and repos ("load the whole universe and reason over it"). GPT-5.3 Codex has around 200,000 tokens, optimized for progressive execution rather than total recall. Task Optimization: Opus is better for tasks requiring "understand everything first, then decide," while Codex is better for "decide fast, act, iterate" and pair programming. Coding Benchmarks: Opus 4.6: Strong in code-based comprehension, architectural refactors, explaining system behavior, and less prone to "YOLO write code" (hallucinations). GPT-5.3 Codex: Won on SWD Bench Pro and Terminal Bench, indicating better end-to-end app generation and known for writing better production code. Agentic Behavior: Opus 4.6: Key feature is multi-agent orchestration, allowing the spinning up of multiple agents for parallel work. GPT-5.3 Codex: Focuses on task-driven autonomy (build, test, modify without being asked) with strong task steering capabilities where users can stop and correct it mid-task. Failure Modes: Opus 4.6: Might overanalyze, hesitate with ambiguous requirements, or stop short on full execution due to its deep planning. GPT-5.3 Codex: Can be overconfident or lock into flawed assumptions early, but can be steered back by the user. Head-to-Head Demo: Building a Polymarket Competitor The demo aimed to build a Polymarket competitor using both models simultaneously with zero canned demos. Opus 4.6 Prompt: "build a competitor to Polymarket, create an agent team to explore this from different angles. One teammate on technical architecture, one understanding Polymarket and the ins and outs of prediction markets, one on UX, and one that just works on building really good tests to make sure everything works." GPT-5.3 Codex Prompt: "build a competitive polymarket, but now think deeply about technical architecture, understanding polymarket and the ins and outs of prediction markets, good clean UX, make sure it builds really good tests to make sure everything works." Demo Results & Observations GPT-5.3 Codex: Completed the task in 3 minutes and 47 seconds. Scaffolded the app from scratch, built core market math, trading engine, and a REST API router. Created 10 tests (LMSR math, engine behavior, API integration) which all passed. The initial UI was functional but bland. Showcased strong mid-execution steering by allowing prompt changes (e.g., asking to spruce up the design, then to emulate Jack Dorsey's design style). However, it required explicit confirmation to resume after questions. Struggled to deliver a truly impactful design refresh, despite attempts. Opus 4.6: Initially, launched four parallel research agents (technical architecture, prediction market, UX, testing) to gather information via web searches. Used significantly more tokens: over 100,000 tokens for research phase alone (each agent used over 25,000 tokens), plus more for building. Estimated 150,000-250,000 tokens total. After extensive research, it proceeded to build the app, including API routes and front-end UI. Created 96 tests, significantly more detailed than Codex. The final output, named "Forecast," featured an exceptionally clean, elegant, and interactive UI (like a "Jack Dorsey design") with dark mode, hover states, and pre-populated content (leaderboard, portfolio). This design was achieved without explicit visual design instructions. Opus 4.6 "won" this specific test in terms of quality and sophistication of the final product, despite taking longer and consuming more tokens. Cost Implications: High token usage for Opus 4.6 (e.g., 100,000 tokens) still translates to a relatively low cost (e.g., ~$20 based on an estimated 10 million tokens/month for $200 Claude Max plan), making agent usage a potential revenue driver for Anthropic. Conclusion and Recommendations The choice between Opus 4.6 and GPT-5.3 Codex depends on the task and preferred workflow: Codex for fast iteration and human-in-loop control, Opus for deep planning, autonomous agents, and high-quality, complex outputs. Morgan recommends engineering teams to experiment with both models for different tasks to see which performs better for their specific needs. Users interested in Opus 4.6's agent features should consult the official documentation for details on sub-agents, communication, coordination, and display modes. Morgan Linton is the co-founder and CTO of Bold Metrics, an AI technology company providing sizing solutions for apparel brands.