Silicon in Irish (Part 4): Benchmark v2
Gemini 3 Flash wins the expanded 1,341-test Irish grammar benchmark at 72.4% accuracy
Filter by tag:
Gemini 3 Flash wins the expanded 1,341-test Irish grammar benchmark at 72.4% accuracy
Comparing three independent studies on LLM Irish grammar competence - all hitting the same ~70% ceiling
Anthropic recently released Claude Skills—a framework for packaging reusable workflows that Claude can execute on-demand. Think of them as Standard Operating Procedures for AI agents.
A Framework for AI Evals That Actually Works
Welcome to Mines and Rabbit Holes #2. It is great to say that there is indeed a follow up and I hope you enjoy!
AI feels like a real conversation. It isn’t.
You can't be both the student and the teacher
The word of the year. Focail na mbliana.
#1?
Harnessing Human Imperfection and Bold Vision in an AI-Driven World
300k -> 0k
Why Documentation is the New Superpower
Friend or Foe? Neither: Frenemy.
We Haven’t had the Windows 95 Moment Yet
How the Memory Paradox and AI Slide Are Undermining Human Intelligence
The New Terrain of Critical Thinking
Un-reason-LLM-bly good?
The Traditional Org Chart is Being Toppled by AI
Why We Hunt for Better Metaphors
AI's Unexpected Fluency in Irish
Your AI Coding Assistant Keeps You Dry—Until It Doesn't
Navigating the AI Training Gap
A Holiday Guide to Better Questions
programming languages >> natural languages
Abstracting and synthesising our way in and out of novel problems
Approaching the Speed of Thought for Content Creation
AI aids development; but, without the human touch, it will never be accessible to all.
AI can be the great variance amplifier for minority languages
You can escape the attention-mining rush and learn faster simply by reading what you like.
DevDay for OpenAI brings AI agents to the masses
LLMs still going brrr thanks to timesaving with JSON outputs