Yun’s Substack
Subscribe
Sign in
Home
Archive
About
Phase 2 Experiment Design: Cross-Lab Reasoning Drift Benchmark (GPT-5 vs. Claude Opus 4.1)
Executive Summary
Aug 24
•
Yun Huang
Detecting Behavioral Drift in GPT-5 vs. 4o Under Social, Emotional, and Authority Framing
Phase 2 (Completed): GPT-5 vs GPT-4o Baseline
Aug 21
•
Yun Huang
1
June 2025
Fine-Tuning GPT-2 for Narrative Structure: A Behavioral Probe
an exercise to understand model behavior
Jun 21
•
Yun Huang
From Performance to Pressure: How GPT-5 May Shift the Ground Beneath Model Evaluation
GPT-4o gave us a glimpse of what’s coming — fast, expressive, multimodal.
Jun 12
•
Yun Huang
A statistical approach to model evaluations
A recap of key insights from Anthropic's blog on model evaluations (Nov 2024)
Jun 9
•
Yun Huang
January 2025
Understand GPT-4: A Technical Analysis
Core Capabilities
Jan 10
•
Yun Huang
July 2024
Comparing Zero-Shot vs. Few-Shot Prompting: Cost-Efficiency vs. Precision
Comparing Accuracy in Zero-Shot vs. Few-Shot Prompting Using GPT-3.5 Turbo API
Jul 18, 2024
•
Yun Huang
June 2024
What's happening in AI? An analysis of funding & adoption trends, June 2024
This year, over $30 billion has been invested in AI.
Jun 12, 2024
•
Yun Huang
2
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts