Forem

Claude Code: Hooks, Subagents, and Skills — Complete Guide

Owen — Sat, 25 Apr 2026 08:45:11 +0000

Claude Code: Hooks, Subagents, and Skills — Complete Guide

Claude Code offers three extensibility layers: hooks for lifecycle automation, subagents for parallel task delegation, and skills for reusable prompt templates. This guide explains each mechanism, when to apply which, and how to combine them effectively.

What This Guide Covers

Hooks, subagents, and skills transform Claude Code from a conversational tool into a programmable AI engineering platform. For foundational setup, reference the configuration guide and model selection documentation.

Hooks: Deterministic Control Over Claude Code

Hooks are event-driven scripts executing when something happens in Claude Code. Unlike prompts relying on model interpretation, hooks run deterministic code incapable of hallucination.

Why Hooks Matter

Without hooks, every safeguard depends on the model understanding instructions. With hooks, rules enforce at the system level. Block dangerous commands before execution. Inject project context automatically. Log every tool call for audit purposes.

Hook Types and What They Do

Type	What It Runs	Best For
`command`	Shell script receiving JSON on stdin	Blocking dangerous commands, local validation
`http`	HTTP POST endpoint	Centralized policy enforcement, remote logging
`mcp_tool`	Connected MCP server tool	Integration with external security scanners
`prompt`	Single-turn LLM evaluation	Semantic validation ("does this look like a secret?")
`agent`	Subagent using tools to verify	Complex multi-step validation before approval

The 25 Lifecycle Events

Hooks fire at 25 distinct lifecycle points. Blocking-capable events include:

UserPromptSubmit — Fires when you submit a prompt. Can block or modify the prompt before Claude sees it.
PreToolUse — Fires before any tool executes. The primary security checkpoint.
PermissionRequest — Fires when Claude asks for permission. Can auto-approve or deny.
Stop / SubagentStop — Fires when Claude or a subagent finishes. Can force continuation.
PreCompact — Fires before context compaction. Can back up transcripts.

Informational events cannot block but can log or notify:

SessionStart / SessionEnd — Session lifecycle. Load context on start, clean up on end.
PostToolUse / PostToolUseFailure — Tool completion or failure. Log results, run linters.
SubagentStart — Subagent spawned. Track agent orchestration.
Notification — Claude sends a notification. Route to Slack, trigger TTS.

Exit Code Behavior

Exit Code	Meaning
`0`	Success. stdout parsed for JSON decisions.
`2`	Blocking error. stderr fed to Claude; action blocked.
`1` or other	Non-blocking error. First line of stderr shown; execution continues.

Example: Block Dangerous Commands with PreToolUse

Create .claude/hooks/block-rm.sh:

#!/bin/bash
COMMAND=$(jq -r '.tool_input.command')

if echo "$COMMAND" | grep -q 'rm -rf'; then
  jq -n '{
    hookSpecificOutput: {
      hookEventName: "PreToolUse",
      permissionDecision: "deny",
      permissionDecisionReason: "Destructive command blocked by hook"
    }
  }'
  exit 2
else
  exit 0
fi

Configure in .claude/settings.json:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "if": "Bash(rm *)",
            "command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/block-rm.sh"
          }
        ]
      }
    ]
  }
}

Now any rm -rf command is blocked before execution, with the denial reason shown to Claude.

Example: Auto-Inject Project Context on SessionStart

#!/bin/bash
# .claude/hooks/session-start.sh
if [ -f "$PWD/CLAUDE.md" ]; then
  echo "Loaded project context from CLAUDE.md"
fi
if [ -f "$PWD/.env.example" ]; then
  echo "Environment template available at .env.example"
fi
exit 0

This runs every time Claude Code starts in a directory, surfacing relevant context automatically.

Hook Scopes

Location	Scope
`~/.claude/settings.json`	All projects
`.claude/settings.json`	Single project
`.claude/settings.local.json`	Single project, not shared
Skill/agent frontmatter	Component lifetime

Project-level hooks are ideal for team-shared policies. Personal hooks in ~/.claude/ apply everywhere.

Subagents: Parallel Workers with Isolated Context

Subagents are specialized AI instances handling tasks in their own context window. When a subagent runs, its verbose output — file searches, log dumps, multi-step reasoning — stays isolated. Only the summary returns to your main conversation.

Built-in Subagents

Subagent	Model	Tools	Purpose
Explore	Haiku	Read-only	Fast codebase search and analysis
Plan	Inherits	Read-only	Research for plan mode
General-purpose	Inherits	All tools	Complex multi-step tasks

Claude delegates automatically based on task type. You can also invoke explicitly with @agent-name or claude --agent <name>.

When to Use Subagents

Use subagents when:

A task produces verbose output you do not need in main context
You want to enforce tool restrictions (e.g., read-only review)
You need parallel research on independent topics
The work is self-contained and can return a summary

Use the main conversation when:

The task needs frequent back-and-forth refinement
Multiple phases share significant context
Latency matters (subagents start fresh)

Creating a Custom Subagent

Subagents are Markdown files with YAML frontmatter. Save to .claude/agents/ (project) or ~/.claude/agents/ (personal):

---
name: code-reviewer
description: Expert code review specialist. Proactively reviews code for quality, security, and maintainability. Use immediately after writing or modifying code.
tools: Read, Grep, Glob, Bash
model: sonnet
---

You are a senior code reviewer ensuring high standards of code quality and security.

When invoked:
1. Run git diff to see recent changes
2. Focus on modified files
3. Begin review immediately

Review checklist:
- Code is clear and readable
- Functions and variables are well-named
- No duplicated code
- Proper error handling
- No exposed secrets or API keys
- Input validation implemented
- Good test coverage
- Performance considerations addressed

Provide feedback organized by priority:
- Critical issues (must fix)
- Warnings (should fix)
- Suggestions (consider improving)

Include specific examples of how to fix issues.

Invoke with: Use the code-reviewer agent to review my auth changes

Or guarantee delegation with @-mention: @"code-reviewer (agent)" look at the auth changes

Subagent Configuration Options

Field	Purpose
`tools`	Allowlist of tools the subagent can use
`disallowedTools`	Denylist (e.g., `Write, Edit` for read-only agents)
`model`	`sonnet`, `opus`, `haiku`, `inherit`, or full model ID
`permissionMode`	`default`, `acceptEdits`, `auto`, `dontAsk`, `bypassPermissions`, `plan`
`skills`	Preload skill content into subagent context
`mcpServers`	MCP servers scoped to this subagent
`hooks`	Lifecycle hooks scoped to this subagent
`memory`	Persistent memory: `user`, `project`, or `local`
`isolation`	`worktree` for git branch isolation
`maxTurns`	Maximum agentic turns before stopping

Preloading Skills into Subagents

Subagents do not inherit parent skills. Preload explicitly:

---
name: api-developer
description: Implement API endpoints following team conventions
skills:
  - api-conventions
  - error-handling-patterns
---

Implement API endpoints. Follow the conventions and patterns from the preloaded skills.

The full skill content is injected at startup, not just made available.

Forked Subagents (Experimental)

Forks inherit the full conversation history instead of starting fresh. Use them when a named subagent would need too much background context.

Enable: CLAUDE_CODE_FORK_SUBAGENT=1

Spawn: /fork draft unit tests for the parser changes

Forks run in the background while you continue working. Results arrive as messages when complete.

Parallel Research Pattern

Request: "Research the authentication, database, and API modules in parallel using separate subagents"

Each subagent explores independently. Claude synthesizes the findings. This works best when research paths do not depend on each other.

Skills: Reusable Prompts and Workflows

Skills extend what Claude can do by packaging instructions into invocable commands. Create a skill when you keep pasting the same playbook into chat.

Skills vs. CLAUDE.md

Aspect	CLAUDE.md	Skills
Loads	Automatically on session start	Only when invoked
Best for	Project conventions, permanent context	Procedures, playbooks, workflows
Cost	Always in context	Only when used

Unlike CLAUDE.md content, a skill's body loads only when invoked, so long reference material costs almost nothing until needed.

Creating Your First Skill

mkdir -p ~/.claude/skills/explain-code

Create ~/.claude/skills/explain-code/SKILL.md:

---
name: explain-code
description: Explains code with visual diagrams and analogies. Use when explaining how code works, teaching about a codebase, or when the user asks "how does this work?"
---

When explaining code, always include:

1. **Start with an analogy**: Compare the code to something from everyday life
2. **Draw a diagram**: Use ASCII art to show the flow, structure, or relationships
3. **Walk through the code**: Explain step-by-step what happens
4. **Highlight a gotcha**: What's a common mistake or misconception?

Keep explanations conversational. For complex concepts, use multiple analogies.

Invoke automatically: How does this code work?

Invoke directly: /explain-code src/auth/login.ts

Skill Frontmatter Reference

Field	Purpose
`name`	Display name; becomes the `/slash-command`
`description`	When Claude should use the skill automatically
`disable-model-invocation`	Set `true` to prevent auto-loading (for dangerous ops like deploy)
`user-invocable`	Set `false` to hide from `/` menu (background knowledge only)
`allowed-tools`	Tools Claude can use without asking permission when skill is active
`context`	Set `fork` to run in isolated subagent
`agent`	Which subagent type to use with `context: fork`
`model` / `effort`	Override model or effort level when skill is active
`paths`	Glob patterns limiting when skill auto-activates

Dynamic Context Injection

The !`command` syntax runs shell commands before the skill content is sent to Claude:

---
name: pr-summary
description: Summarize changes in a pull request
context: fork
agent: Explore
allowed-tools: Bash(gh *)
---

## Pull request context
- PR diff: !`gh pr diff`
- PR comments: !`gh pr view --comments`
- Changed files: !`gh pr diff --name-only`

## Your task
Summarize this pull request...

Commands execute immediately; Claude receives only the output. For multi-line commands, use `! fenced blocks.

Skill Directory Structure

`plaintext my-skill/ ├── SKILL.md # Main instructions (required) ├── template.md # Template for Claude to fill in ├── examples/ │ └── sample.md # Example output └── scripts/ └── validate.sh # Script Claude can execute `

Reference supporting files from SKILL.md so Claude knows what they contain and when to load them.

Where Skills Live

Location	Path	Scope
Enterprise	Managed settings	Organization-wide
Personal	`~/.claude/skills/<name>/SKILL.md`	All your projects
Project	`.claude/skills/<name>/SKILL.md`	This project only
Plugin	`<plugin>/skills/<name>/SKILL.md`	Where plugin is enabled

Higher-priority locations win: enterprise > personal > project. Plugin skills use plugin-name:skill-name namespace.

Bundled Skills

Claude Code includes built-in skills available in every session:

/simplify — Simplify complex code or explanations
/debug — Systematic debugging workflow
/batch — Process multiple items efficiently
/loop — Iterate on a task until complete
/claude-api — Reference for Claude API patterns

These are prompt-based, not hardcoded. They give Claude a detailed playbook and let it orchestrate the work.

Combining Hooks, Subagents, and Skills

The three features compose together. Here is a production-ready setup:

Example: Secure Code Review Pipeline

Skill (~/.claude/skills/secure-review/SKILL.md):

`markdown

name: secure-review
description: Security-focused code review. Use when reviewing authentication, authorization, or data handling code.
context: fork
agent: Explore

disable-model-invocation: true

Perform a security-focused code review:

Check for hardcoded secrets, API keys, or credentials
Verify input validation and sanitization
Review authentication and authorization logic
Check for SQL injection, XSS, and injection vulnerabilities
Verify error handling does not leak sensitive information
Check file upload and path traversal protections

Report findings with severity levels and specific file references.
`

Hook (.claude/settings.json):

`json { "hooks": { "PostToolUse": [ { "matcher": "Edit|Write", "hooks": [ { "type": "command", "command": "./scripts/run-security-linter.sh" } ] } ] } } `

Subagent (.claude/agents/security-reviewer.md):

`markdown

name: security-reviewer
description: Security review specialist for auth and data handling code
tools: Read, Grep, Glob, Bash

disallowedTools: Edit, Write

You are a security-focused code reviewer. Focus on:

Authentication and authorization flaws
Input validation gaps
Secret leakage
Injection vulnerabilities

Never modify code. Only report findings.
`

Usage: After editing auth code, run /secure-review or ask Claude to have the security-reviewer agent check these changes. The PostToolUse hook runs the security linter on every file edit automatically.

Practical Patterns

Pattern 1: Context Preservation

Use subagents for operations producing large outputs:

"Use a subagent to run the test suite and report only the failing tests with their error messages"

The full test output stays in the subagent's context. You get only the actionable summary.

Pattern 2: Tool Restriction

Limit what subagents can do for safety:

`markdown

name: db-reader
description: Execute read-only database queries
tools: Bash
hooks:
PreToolUse:
- matcher: "Bash"
hooks:
- type: command

command: "./scripts/validate-readonly-query.sh"

The hook blocks any SQL write operation before it executes.

Pattern 3: Model Routing

Route different tasks to different models for cost optimization:

`markdown

name: quick-classifier
description: Classify incoming requests by type and complexity

model: haiku

Classify this request as: simple, complex, or research-heavy.
`

Haiku is fast and cheap for classification. Route complex tasks to Sonnet or Opus.

Pattern 4: Persistent Memory

Enable cross-session learning for subagents:

`markdown

name: codebase-architect
description: Maintains architectural knowledge of the codebase

memory: project

Update your agent memory as you discover codepaths, patterns, library locations, and key architectural decisions.
`

The subagent accumulates knowledge in .claude/agent-memory/codebase-architect/ across conversations.

Accessing Claude Code Through OfoxAI

Claude Code works with any Anthropic-compatible API endpoint. OfoxAI provides full protocol support including extended thinking and cache_control.

Configuration:

Request URL: https://api.ofox.ai/anthropic
API Key: Your OfoxAI key from app.ofox.ai

For setup instructions, see the Claude Code configuration guide. For model comparisons, see Claude Opus 4.7 API review and best AI model for agents 2026.

Summary

Feature	What It Does	Use When
Hooks	Deterministic lifecycle scripts	Security, logging, context injection, blocking
Subagents	Isolated AI workers	Parallel tasks, verbose output isolation, tool restriction
Skills	Reusable prompt templates	Repeatable workflows, conventions, team knowledge

Start with skills — they are the easiest to create and provide immediate value. Add hooks when you need deterministic enforcement. Use subagents when parallel work or context isolation matters.

The teams getting the most from Claude Code treat it as a programmable platform, not just a chat interface. Hooks, subagents, and skills are the tools that make that transition possible.

Originally published on ofox.ai/blog.

Setting Up Your First Azure Storage Account

Emmanuel Banjo — Sat, 25 Apr 2026 08:44:49 +0000

Introduction

Starting with Azure can feel overwhelming. There are so many options, settings, and configurations that it's hard to know where to begin. But here's the good news: setting up a storage account doesn't have to be complicated.
I recently went through the process of creating my first Azure Storage Account for a learning project, and I want to share what I learned. This guide will walk you through each step in plain language.
We'll create a storage account that's perfect for learning and testing i.e one that's secure, won't rack up unexpected charges, and follows good practices from the start. No prior Azure experience needed!

Step 1: Creating a Resource Group

What's a Resource Group?
Think of a resource group like a folder on your computer. Just like you'd put all your vacation photos in one folder, you put all your related Azure resources in one resource group. It makes everything easier to organize and manage.
The cool part? When you delete the resource group later, everything inside gets deleted too. No hunting down individual items. Perfect for learning projects!

Let's Create One quickly

Open the Azure Portal and type Resource groups in the search bar

Click the + Create button

Give it a simple name like my-storage-project or learning-storage
Pick a region (like East US or West Europe)just choose one close to you
Click Review and create

then Create

Creating Your Storage Account
Now let's create the actual storage account

Search for Storage accounts in the Azure Portal

Click + Create

Select the resource group you just created
Give your storage account a name (it needs to be unique across all of Azure, so try something like mystoragelearn123)
Keep Performance set to Standard (this is the cheaper option and perfect for learning)
Click Review + Create

Then Create

Wait a minute for it to deploy, then click Go to resource

Done! You now have your own storage account in the cloud.

Step 2: Choosing How Your Data is Stored

What's Redundancy?
When you store data in Azure, it automatically makes copies in case something goes wrong. The question is: how many copies do you need?

LRS **(Locally Redundant Storage): Makes 3 copies in one location - cheapest option
**ZRS, GRS, GZRS: More copies in more places - costs more money

For learning and testing, LRS is perfect. You save money and still have backup copies.

Setting it to LRS

In your storage account, find Data management on the left menu, then click Redundancy

Change the dropdown to Locally-redundant storage (LRS)
Click Save

That's it! You just cut your storage costs significantly.

Step 3: Making It Secure

Now let's make sure your storage is secure. Don't worry, it's just a few simple settings.

Use HTTPS (Keep Your Data Safe in Transit)
You want all your data to travel securely over the internet, like using HTTPS on a website.
- Go to Settings → Configuration
- Make sure Secure transfer required says Enabled

That's it. Now all your data travels encrypted.

Use Modern Security (TLS 1.2)
This is like saying "only let in people with new security badges,

not old ones that can be faked."
- Still in Settings → Configuration
- Check that Minimum TLS version is set to Version 1.2

Control Who Can Access It
If you're not using the storage right now, you can turn off access:
- In Settings → Configuration
- Find Allow storage account key access and set it to Disabled
- Click Save

You can always turn this back on when you need it.

Step 4: Allow Network Access

Go to Security + networking → Networking
Make sure Public network access is set to Enabled from all networks
Click Save

Congratulations! You just set up your first Azure Storage Account. I know it might have seemed like a lot of steps, but you did it!

Did this guide help you? Drop a comment and let me know how your setup went! And if you got stuck anywhere, ask away. we're all learning together.

Helpful Links

What is Azure Storage? - Official Microsoft docs
Azure Free Account - Get free credits to practice with

I built a tool to stop metadata loss: IPMD (Image Pixel Metadata)

kelechi — Sat, 25 Apr 2026 08:43:59 +0000

The problem: When you send photos through your device to another devices, the original metadata (date, time, etc..) is usually overwritten. Read more.
Repo on GitHub.

Finding the Gold: An AI Framework for Highlight Detection

Ken Deng — Sat, 25 Apr 2026 08:40:51 +0000

Staring down hours of raw footage, the hunt for those perfect, engaging moments can feel overwhelming. It's tedious, time-consuming, and creatively draining. What if your first rough cut could be assembled for you, pinpointing the clips most likely to resonate?

The key is moving beyond single-signal detection. Isolating sections where multiple AI signals cross-reference is the professional's principle for high-confidence highlights. A single audio spike might be a false positive—a door slam or cough. A visual cue alone might not capture context. But when you layer signals, you find gold.

Layer 1: The Automated First Pass (The Broad Net)
Use a tool like Descript to generate a transcript and initial analysis. It can flag sections where the speaker's pace increases by over 20%, indicating passion or comedic timing, and detect extreme facial expressions like surprise or joy, scoring them for intensity.

Layer 2: The Transcript-Based Deep Dive (The Precision Hook)
Here, you cross-reference. Search your transcript for linguistic hooks—sentences ending with "?!" or phrases like "wait until you see..."—that often coincide with sentiment peaks (the highest or lowest emotional scores). Did the AI highlight a visual action and a laughter spike? That's your high-confidence highlight.

Scenario: Editing a 2-hour podcast, your AI flags a guest's quickening speech. The transcript shows them saying, "The key is..." while the sentiment graph spikes positively. Syncing these markers creates a powerful, multi-layered highlight candidate.

Implementing This Workflow:

Run Multi-Modal Analysis: Process your footage through tools that provide transcript, sentiment, pace, and visual expression data.
Cross-Reference Signals: Manually review sections where at least two strong indicators (e.g., pace + sentiment, or phrase + visual) overlap. Immediately delete false positives like technical glitches.
Sync & Story Check: Export these timestamped selections as markers to your NLE. Watch them consecutively. Do they create a compelling micro-story or a jarring jump?

By adopting a cross-referenced, multi-layered AI approach, you transform from a manual scavenger into a strategic director. You leverage AI to handle broad pattern recognition, freeing you to focus on the creative synthesis that makes an edit truly great.

Debugging AI Agents in Production: ADK+Gemini Cloud Assist | Google Cloud NEXT '26

hiruthicSha — Sat, 25 Apr 2026 08:40:39 +0000

This is a submission for the Google Cloud NEXT Writing Challenge

Google Cloud NEXT '26 quietly introduced a problem most developers are not ready for.

Your system no longer fails because of a bug.
It fails because an agent made a reasonable decision that turned out to be wrong.

That difference sounds subtle.
It isn’t.

Trust me, this is such a pain in the butt, I'm saying this coz I worked for both Gemini 3 Hackathon and Gemini Live Agent Challenge, and I know how easy it is to fall into such traps.

This article walks through that shift using what Google actually demonstrated on stage:

how the Agent Development Kit (ADK) changes development
how multi-agent systems behave in production
and how Gemini Cloud Assist becomes your debugging layer

Code Writing Code, and Code Acting on it

The keynote doesn't begin with infrastructure or APIs. It starts with something more unsettling.

Music is generated using AI. Visuals are rendered live.
And those visuals? Generated by code that Gemini writes in real time based on audio input.

This is the pattern the rest of the keynote follows, but more importantly, it's the pattern we now have to debug.

You can check the visuals at the start till 02:00. These were created using Veo, Nano Banana, Gemini Flash Live and everything is done using Music AI Sandbox

ADK: You're Not Writing Logic Anymore

At the center of everything is the Agent Development Kit (ADK).

At first glance, it looks like just another framework. But it changes something fundamental: You don't define how things happen anymore.

You define:

what the agent is supposed to do
what tools it has access to
what knowledge it can use

And then… you let it decide.

During the keynote, Richard and Emma builds a Marathon Planner Agent. Not a function. Not a service. An agent.

It is given:

instructions (plan a marathon route)
tools (Google Maps via MCP)
skills (GIS logic, race planning rules)

From there, it figures things out.

No explicit control flow. No step-by-step orchestration.

The Subtle but Dangerous Shift

In a normal system, if something goes wrong, you know where to look. In an ADK-based system:

The agent may choose the wrong tool
or use the right tool incorrectly
or interpret the prompt differently
or combine context in unexpected ways
or a whole new level of problem that we haven't yet figured out

Nothing is strictly "broken". It just… behaves incorrectly.

When One Agent Isn't Enough

The demo quickly evolves beyond a single agent. Instead of forcing one agent to do everything, they split responsibilities:

a Planner Agent proposes routes
an Evaluator Agent scores them
a Simulator Agent runs the world

This is where things start to look less like software and more like a system of collaborators. These agents don't call APIs directly. They discover each other.

Google introduces:

A2A (Agent-to-Agent protocol) => how agents communicate
Agent Registry => how agents find each other

Think of it as DNS for agents.

The Most Underrated Feature: Agents Build Their Own UI

One of the most interesting moments in the keynote is easy to miss.

The UI isn't manually built. The agent generates it. Using something called A2UI, the agent:

decides how results should be displayed
constructs components
renders them dynamically

This removes an entire layer of development.

Context Engineering Is Where Systems Break

As the system evolves, more data is introduced:

city regulations
traffic constraints
historical patterns

This is handled through:

sessions (state across interactions)
memory (long-term knowledge)
RAG (retrieval from databases)

The agent starts behaving more intelligently.

It also becomes far more fragile. At one point, the agent learns: "You can't have a camel on public roads"

Funny in isolation. Critical when that rule influences route planning.

Debugging Stops Being Mechanical

In a traditional system, you would:

check logs
inspect stack traces
fix the code

Here, none of that is sufficient. You need to answer:

why did the agent choose this tool?
why did it carry this context forward?
why did memory grow uncontrollably?

That's not debugging code. That's debugging reasoning.

Gemini Cloud Assist: The Real Innovation

Google's answer is not better logs. It's an AI system that debugs your AI system. Gemini Cloud Assist acts as:

investigator
debugger
infra operator
code assistant

When the failure happens, it:

analyzes logs
inspects traces
reads your code
correlates infra issues
identifies root cause

And then it suggests a fix.

What Actually Broke?

The root cause in the demo:

context grew too large
exceeded Gemini's token limit
event compaction wasn't frequent enough

The fix wasn't a rewrite. It was a behavioral adjustment:

compress context more frequently
reduce memory footprint per step

Everything is fine.

Now, if you think I'm gonna leave you hanging after all these intro...

So far we've seen what it can do, now it’s time to use it

So far, everything we discussed lives in the keynote.

Cool demos. Fancy systems. "Wow, agents!"

But none of that matters unless we can actually build something that behaves like that.

So instead of jumping straight into "multi-agent, cloud-native, distributed magic"… we start small. Controlled. Understandable.

We build a system where:

an agent makes a decision
that decision actually affects something real
and we can see the impact visually

Step 1: Define the World

Before bringing Gemini into the picture, I need a system that can react to decisions.

So I'll build a simple simulation:

a route (sequence of coordinates)
runners moving along that route
a visualization of their positions over time

At this stage, everything is deterministic.

Then convert this into a dense path:

And simulate runners:

Each runner:

moves at a slightly different speed
has small randomness
doesn’t perfectly overlap with others

This gives us something that already looks like a race.

Step 2: Bring in Gemini

Now comes the important part. We don’t ask Gemini to generate coordinates.
That’s a trap.

Instead, we constrain it. We define a few route templates:

Now Gemini’s job is simple: Pick the type of route.

Step 3: The Planner Agent

Notice what we did here:

limited output space
avoided parsing nightmares
kept the system predictable

This is exactly how you should use LLMs in systems.

Step 4: Connect Decision => Behavior

Now wire everything together:

What You’re Actually Seeing

It represents:

position => where runners are
color => how far they’ve progressed
shape => the route chosen by Gemini

Change the prompt, and the route changes. Change the route, and the entire distribution changes.

Curved path selection:

Step 5: When It Broke (and Nothing Looked Broken)

At some point, the system started behaving… oddly.

Gemini consistently chose curved routes, even when the prompt clearly favored straight ones.

Nothing failed.

No exceptions.
No crashes.
No warnings.

The simulation ran perfectly. But the output distribution was wrong.

At first, it looked like randomness. Then it looked like bias. Eventually, it became clear: the model was over-weighting certain keywords in the prompt and mapping them incorrectly to route templates.

The problem wasn’t in the simulation.
It wasn’t in the data.
It was in how the agent interpreted intent.

Debugging this felt very different from normal debugging:

There was no single place to look

no clear cause-and-effect chain

The only behavior that emerged over multiple runs

The fix wasn’t a code change.

It was:

tightening the prompt
reducing ambiguity
making output constraints stricter

The system didn’t become “correct”.
It became less wrong.

That’s the mindset shift with non-deterministic systems: In non-deterministic systems, correctness isn’t a state.
It’s a range you try to keep within acceptable bounds.

Why This Matters

At this point, Gemini is not "doing everything". It’s doing something more important:

It decides the conditions under which the system runs.

That’s the shift.

We’ve moved from static code controlling behavior to AI influencing system dynamics

What You Just Did

You didn't debug code.

You debugged behavior.

You constrained decision space.
You shaped how the agent interprets intent.
You reduced how wrong the system can be.

That’s a fundamentally different skill. Because in these systems, correctness is not guaranteed. It is negotiated.

Note: This isn’t meant to match the keynote. It’s a minimal example showing a bigger idea: shifting from writing fixed logic to building systems that decide how to behave at runtime.

Final Takeaway

Google didn't just launch tools. It revealed a shift:

Software is no longer deterministic execution
It is probabilistic decision-making

And that means:

debugging is harder
observability is critical
architecture matters more than ever
Closing Thought

The hardest bug in the future isn't:
"Why did this fail?"
It’s:
"Why did the system think this was correct?"
Because we didn’t just make software more powerful.
We made it capable of being wrong in far more complex ways.

Waiting for the day a hotfix pops up: “Fix the AI pipeline” 😂. Thankfully, we're on Google's stack, so at least I'll have the right tools when it happens.

Why Your Vibe-Coded App Is a Security Disaster Waiting to Happen

Jagadishwar reddy — Sat, 25 Apr 2026 08:37:58 +0000

Every week, thousands of apps get shipped using Lovable, Bolt,
Cursor, and v0. Fast, beautiful, functional.
And almost all of them have serious security vulnerabilities.
I know because I built a tool to scan them.

The Problem Nobody Talks About

AI coding tools are incredible at building features. They're
terrible at security.

When you prompt "build me a user authentication system," the AI
does it. But it probably also:

Stores passwords without proper hashing
Exposes your API keys in client-side code
Skips input validation on every form
Leaves SQL injection vulnerabilities wide open
Sets up broken access control so any user can access any data

You ship it. It works. Users sign up. Everything looks fine.

Until it isn't.

What I Found Scanning Real Vibe-Coded Apps

After scanning dozens of apps built with AI tools, the most
common vulnerabilities were:

1. Hardcoded API keys — Gemini, OpenAI, Stripe keys sitting
right in the frontend code. Anyone can open DevTools and steal them.

2. Missing authentication checks — Routes that should be
protected are completely open. Change the URL, access anything.

3. Broken input validation — Forms that accept anything,
including malicious scripts and SQL commands.

4. Exposed Supabase configs — Row Level Security disabled or
misconfigured, giving anyone full database access.

5. No rate limiting — APIs that can be hammered infinitely,
racking up your bill or crashing your app.

These aren't advanced attacks. A script kiddie can find and
exploit these in minutes.

Why AI Tools Miss This

It's not the AI's fault. It's the nature of prompting.

When you say "add a payment form," the AI focuses on making the
payment form work. Security is a second-order concern that
requires explicit prompting — and most people don't know what
to ask.

The AI is optimizing for "does this work in the demo?" not
"is this safe in production?"

What You Should Do Before Shipping

At minimum, before any vibe-coded app goes live:

Audit your environment variables — nothing sensitive in frontend code, ever
Check every API route — does it verify the user is logged in?
Enable RLS on Supabase — and actually test it
Validate all inputs — server-side, not just client-side
Add rate limiting — on auth endpoints especially

Or... let a scanner do it automatically.

I Built CodeSafe for This

CodeSafe is a multi-agent security scanner built specifically
for vibe-coded apps. You upload your code, and 6 specialized
AI agents scan it for:

Authentication & authorization flaws
Exposed secrets and API keys
Injection vulnerabilities
Broken access control
Security misconfigurations
Dependency vulnerabilities

The killer feature: for every vulnerability found, you get a
"Copy Fix Prompt" — paste it directly into Cursor, Lovable,
or whatever AI tool you used to build it, and it fixes the issue.

No security expertise needed. Just upload fixed.

→ Try it free at codesafe.co.in

Ship Fast. Ship Safe.

Vibe-coding isn't going away. It's only getting faster.

The builders who win long-term are the ones who ship fast AND
ship securely. Don't let a preventable vulnerability kill the
product you spent weeks building.

Scan before you ship.

Built this after getting frustrated watching great indie
products get compromised. Happy to answer questions about
vibe-coding security in the comments.

Building a CMS Translation Pipeline: Developer's Guide to i18n Architecture

Diogo Heleno — Sat, 25 Apr 2026 08:35:14 +0000

Building a CMS Translation Pipeline: Developer's Guide to i18n Architecture

While project managers focus on workflow coordination, developers face the technical challenge of building systems that handle multilingual content efficiently. A well-architected translation pipeline reduces manual work, prevents data loss, and scales with your content volume.

This guide covers the technical implementation side: API integrations, automated workflows, and database design patterns that make CMS localization manageable for development teams.

Database Schema Considerations for Multilingual Content

Your database design determines how smoothly translations flow through your system. Two main patterns dominate CMS internationalization:

Separate tables per language (WordPress WPML approach):

CREATE TABLE posts_en (
  id INT PRIMARY KEY,
  title VARCHAR(255),
  content TEXT,
  slug VARCHAR(255)
);

CREATE TABLE posts_es (
  id INT PRIMARY KEY,
  title VARCHAR(255),
  content TEXT,
  slug VARCHAR(255),
  source_id INT -- references posts_en.id
);

Single table with language columns (more common in headless CMS):

CREATE TABLE posts (
  id INT PRIMARY KEY,
  language_code VARCHAR(5),
  title VARCHAR(255),
  content TEXT,
  slug VARCHAR(255),
  translation_group_id INT
);

The single-table approach scales better with multiple languages and simplifies queries, but requires careful indexing on language_code and translation_group_id.

Automated Export/Import Workflows

Manual file exports create bottlenecks. Most translation management systems (TMS) offer APIs that integrate directly with your CMS.

Contentful + Phrase Integration

// Export content for translation
const contentful = require('contentful-management');
const phrase = require('phrase-api');

async function exportForTranslation(entryId, targetLocale) {
  const entry = await contentfulClient.getEntry(entryId);

  // Extract translatable fields
  const translatable = {
    title: entry.fields.title['en-US'],
    body: entry.fields.body['en-US'],
    metaDescription: entry.fields.metaDescription['en-US']
  };

  // Create translation job in Phrase
  const job = await phraseClient.createJob({
    name: `Entry ${entryId} - ${targetLocale}`,
    sourceLocale: 'en',
    targetLocales: [targetLocale],
    content: translatable
  });

  return job.id;
}

Strapi Custom Plugin

Strapi's plugin system lets you build translation workflows directly into the admin interface:

// strapi-plugin-translations/server/controllers/translation.js
module.exports = {
  async exportContent(ctx) {
    const { contentType, id, targetLocale } = ctx.request.body;

    const entity = await strapi.entityService.findOne(
      contentType, 
      id, 
      { populate: '*' }
    );

    // Generate XLIFF format
    const xliff = generateXLIFF(entity, targetLocale);

    // Send to translation service
    const jobId = await translationService.createJob(xliff);

    ctx.body = { success: true, jobId };
  }
};

Handling Complex Content Structures

Modern CMS platforms use nested objects, arrays, and references that don't translate cleanly to flat key-value pairs.

JSON Field Translation

// Original content
const content = {
  hero: {
    title: "Welcome to our platform",
    subtitle: "Build amazing applications",
    cta: { text: "Get Started", url: "/signup" }
  },
  features: [
    { name: "Fast", description: "Lightning quick" },
    { name: "Secure", description: "Bank-grade security" }
  ]
};

// Flatten for translation
function flattenForTranslation(obj, prefix = '') {
  const flattened = {};

  Object.keys(obj).forEach(key => {
    const value = obj[key];
    const newKey = prefix ? `${prefix}.${key}` : key;

    if (typeof value === 'string') {
      flattened[newKey] = value;
    } else if (Array.isArray(value)) {
      value.forEach((item, index) => {
        Object.assign(flattened, flattenForTranslation(item, `${newKey}.${index}`));
      });
    } else if (typeof value === 'object') {
      Object.assign(flattened, flattenForTranslation(value, newKey));
    }
  });

  return flattened;
}

API Design for Multilingual Content

Your API structure affects how frontend applications consume translated content. Consider language-aware endpoints:

// Language-specific routes
app.get('/api/:lang/posts', getPosts);
app.get('/api/:lang/posts/:slug', getPost);

// Or header-based
app.get('/api/posts', (req, res) => {
  const lang = req.headers['accept-language'] || 'en';
  const posts = getPostsByLanguage(lang);
  res.json(posts);
});

// GraphQL with locale argument
const typeDefs = `
  type Query {
    posts(locale: String = "en"): [Post]
    post(slug: String!, locale: String = "en"): Post
  }

  type Post {
    id: ID!
    title: String!
    content: String!
    slug: String!
    locale: String!
  }
`;

Translation Memory Integration

Translation memories (TM) reduce costs by reusing previous translations. Most TMS platforms provide APIs to query existing translations:

async function checkTranslationMemory(sourceText, sourceLang, targetLang) {
  const response = await fetch(`${TM_API_URL}/matches`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${TM_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      source_text: sourceText,
      source_language: sourceLang,
      target_language: targetLang,
      min_match_percentage: 85
    })
  });

  const matches = await response.json();
  return matches.length > 0 ? matches[0].target_text : null;
}

Webhook-Driven Updates

Set up webhooks to automatically import completed translations:

// Express webhook handler
app.post('/webhooks/translation-complete', async (req, res) => {
  const { jobId, targetLocale, translations } = req.body;

  try {
    // Validate webhook signature
    if (!validateSignature(req)) {
      return res.status(401).send('Invalid signature');
    }

    // Import translations back to CMS
    await importTranslations(jobId, targetLocale, translations);

    // Trigger cache invalidation
    await invalidateCache(`/api/${targetLocale}/*`);

    res.status(200).send('OK');
  } catch (error) {
    console.error('Translation import failed:', error);
    res.status(500).send('Import failed');
  }
});

Performance Considerations

Multilingual sites can quickly become slow without proper optimization:

Database indexing: Index on language_code and translation_group_id
CDN configuration: Serve language-specific content from edge locations
Lazy loading: Only load the active language's content
Caching strategy: Cache per language and invalidate selectively

// Redis caching by language
const cacheKey = `posts:${language}:${page}`;
const cached = await redis.get(cacheKey);

if (!cached) {
  const posts = await db.getPosts({ language, page });
  await redis.setex(cacheKey, 3600, JSON.stringify(posts));
  return posts;
}

return JSON.parse(cached);

Testing Multilingual Features

Automated testing becomes crucial with multiple languages:

// Jest test for translation endpoints
describe('Multilingual API', () => {
  test('returns content in requested language', async () => {
    const response = await request(app)
      .get('/api/posts')
      .set('Accept-Language', 'es');

    expect(response.status).toBe(200);
    expect(response.body[0].locale).toBe('es');
    expect(response.body[0].title).not.toContain('Hello'); // English word
  });

  test('falls back to default language', async () => {
    const response = await request(app)
      .get('/api/posts')
      .set('Accept-Language', 'unsupported-lang');

    expect(response.body[0].locale).toBe('en');
  });
});

Next Steps

The technical foundation described here supports the project management practices outlined in M21Global's CMS localization guide. Focus on building automated workflows early. Manual processes don't scale, and technical debt in internationalization systems is expensive to fix later.

Start with a solid database schema, add API endpoints that handle language parameters cleanly, and integrate with translation management platforms through webhooks rather than file uploads. Your future self (and your project managers) will thank you.

🤖 Learn Harness Engineering by Building a Mini Openclaw 🦞

Truong Phung — Sat, 25 Apr 2026 08:34:29 +0000

🔀 Origin & Modifications
🤔 What is this?
🏗️ Architecture
🔗 Section Dependencies
⚡ Quick Start
🗺️ Learning Path
📋 Section Details
📁 Repository Structure
📦 Prerequisites
🧩 Dependencies
🔗 Related Projects
👥 About
📄 License

Git repo: truongpx396/learn-harness-engineering-by-building-mini-openclaw

🔀 Origin & Modifications

This repository is a fork of shareAI-lab/claw0

Changes made in this fork:

🔄 SDK migration: Migrated from the Anthropic SDK to the OpenAI SDK, making all sections compatible with any OpenAI-compatible endpoint.
🖥️ Local model support: Added setup guides for running fully offline with LM Studio, Ollama, and GPT4All — no cloud API required.
⚙️ .env-based configuration: Introduced OPENAI_BASE_URL and MODEL_ID environment variables so you can point any section at a different provider or local server without touching the code.

All credit for the original curriculum, architecture, and teaching approach goes to shareAI-lab.

🚀 From Zero to One: Build an AI Agent Gateway

10 progressive sections -- every section is a single, runnable Python file.
code + docs co-located.

🤔 What is this?

Most agent tutorials stop at "call an API once." This repository starts from that while loop and takes you all the way to a production-grade gateway.

Build a minimal AI agent gateway from scratch, section by section. 10 sections, 10 core concepts, ~7,000 lines of Python. Each section introduces exactly one new idea while keeping all prior code intact. After all 10, you can read OpenClaw's production codebase with confidence.

s01: Agent Loop           -- The foundation: while + finish_reason
s02: Tool Use             -- Let the model call tools: dispatch table
s03: Sessions & Context   -- Persist conversations, handle overflow
s04: Channels             -- Telegram + Feishu: real channel pipelines
s05: Gateway & Routing    -- 5-tier binding, session isolation
s06: Intelligence         -- Soul, memory, skills, prompt assembly
s07: Heartbeat & Cron     -- Proactive agent + scheduled tasks
s08: Delivery             -- Reliable message queue with backoff
s09: Resilience           -- 3-layer retry onion + auth profile rotation
s10: Concurrency          -- Named lanes serialize the chaos

🏗️ Architecture

+--- agent layers ---+
|                                                     |
|  s10: Concurrency  (named lanes, generation track)  |
|  s09: Resilience   (auth rotation, overflow compact)|
|  s08: Delivery     (write-ahead queue, backoff)     |
|  s07: Heartbeat    (lane lock, cron scheduler)      |
|  s06: Intelligence (8-layer prompt, hybrid memory)  |
|  s05: Gateway      (WebSocket, 5-tier routing)      |
|  s04: Channels     (Telegram pipeline, Feishu hook) |
|  s03: Sessions     (JSONL persistence, 3-stage retry)|
|  s02: Tools        (dispatch table, 4 tools)        |
|  s01: Agent Loop   (while True + finish_reason)     |
|                                                     |
+-----------------------------------------------------+

🔗 Section Dependencies

s01 --> s02 --> s03 --> s04 --> s05
                 |               |
                 v               v
                s06 ----------> s07 --> s08
                 |               |
                 v               v
                s09 ----------> s10

s01-s02: Foundation (no dependencies)
s03: Builds on s02 (adds persistence to the tool loop)
s04: Builds on s03 (channels produce InboundMessages for sessions)
s05: Builds on s04 (routes channel messages to agents)
s06: Builds on s03 (uses sessions for context, adds prompt layers)
s07: Builds on s06 (heartbeat uses soul/memory for prompt)
s08: Builds on s07 (heartbeat output flows through delivery queue)
s09: Builds on s03+s06 (reuses ContextGuard for overflow, model config)
s10: Builds on s07 (replaces single Lock with named lane system)

⚡ Quick Start

# 1. Clone and enter
git clone https://github.com/truongpx396/learn-harness-engineering-by-building-mini-openclaw && cd learn-harness-engineering-by-building-a-mini-openclaw

# 2. Install dependencies
pip install -r requirements.txt

# 3. Configure
cp .env.example .env
# Edit .env: set OPENAI_API_KEY, MODEL_ID, and OPENAI_BASE_URL

# 4. Run any section (pick your language)
python sessions/en/s01_agent_loop.py    # English

🗺️ Learning Path

Each section adds exactly one new concept. All prior code stays intact:

Phase 1: FOUNDATION     Phase 2: CONNECTIVITY     Phase 3: BRAIN        Phase 4: AUTONOMY       Phase 5: PRODUCTION
+----------------+      +-------------------+     +-----------------+   +-----------------+   +-----------------+
| s01: Loop      |      | s03: Sessions     |     | s06: Intelligence|  | s07: Heartbeat  |   | s09: Resilience |
| s02: Tools     | ---> | s04: Channels     | --> |   soul, memory, | ->|   & Cron        |-->|   & Concurrency |
|                |      | s05: Gateway      |     |   skills, prompt |  | s08: Delivery   |   | s10: Lanes      |
+----------------+      +-------------------+     +-----------------+   +-----------------+   +-----------------+
 while + dispatch        persist + route            personality + recall  proactive + reliable  retry + serialize

📋 Section Details

#	Section	Core Concept	Lines
01	Agent Loop	`while True` + `finish_reason` -- that's an agent	~175
02	Tool Use	Tools = schema dict + handler map. Model picks a name, you look it up	~445
03	Sessions	JSONL: append on write, replay on read. Too big? Summarize old parts	~890
04	Channels	Every platform differs, but they all produce the same `InboundMessage`	~780
05	Gateway	Binding table maps (channel, peer) to agent. Most specific wins	~625
06	Intelligence	System prompt = files on disk. Swap files, change personality	~750
07	Heartbeat & Cron	Timer thread: "should I run?" + queue work alongside user messages	~660
08	Delivery	Write to disk first, then send. Crashes can't lose messages	~870
09	Resilience	3-layer retry onion: auth rotation, overflow compaction, tool-use loop	~1130
10	Concurrency	Named lanes with FIFO queues, generation tracking, Future-based results	~900

📁 Repository Structure

learn-harness-engineering-by-building-a-mini-openclaw/
  README.md              English README
  .env.example           Configuration template
  requirements.txt       Python dependencies
  sessions/              All teaching sessions (code + docs)
    en/                  English
      s01_agent_loop.py  s01_agent_loop.md
      s02_tool_use.py    s02_tool_use.md
      ...                (10 .py + 10 .md)
  workspace/             Shared workspace samples
    SOUL.md  IDENTITY.md  TOOLS.md  USER.md
    HEARTBEAT.md  BOOTSTRAP.md  AGENTS.md  MEMORY.md
    CRON.json
    skills/example-skill/SKILL.md

📦 Prerequisites

Python 3.11+
An OpenAI-compatible API key (e.g. GitHub Models, Azure OpenAI, or any provider)

💻 Running Locally (no cloud API required)

All agents speak the OpenAI chat-completions protocol. Any local server that exposes a compatible endpoint works out of the box — no GPU required, CPU-only inference is supported by all three options below.

Option A — 🖥️ LM Studio

LM Studio provides a GUI for downloading and serving models.

1. Install & load a model

Download and install LM Studio.
In the Discover tab, search for a small instruction-tuned model. Good CPU-friendly choices: Qwen2.5-7B-Instruct, Mistral-7B-Instruct, Phi-3-mini.
Click Download next to your chosen model.

2. Start the local server

Open the Developer tab (</> icon in the left sidebar).
Select your model from the dropdown and click Start Server.
LM Studio listens at http://localhost:1234/v1. Copy the model identifier shown (e.g. lmstudio-community/Qwen2.5-7B-Instruct-GGUF).

3. Configure .env

OPENAI_API_KEY=lm-studio        # any non-empty string works
OPENAI_BASE_URL=http://localhost:1234/v1
MODEL_ID=lmstudio-community/Qwen2.5-7B-Instruct-GGUF

Option B — 🦙 Ollama

Ollama is a lightweight CLI that manages and serves models with a single command.

1. Install Ollama

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows: download the installer from https://ollama.com/download

2. Pull a model & start the server

ollama pull qwen2.5:7b          # or: mistral, phi3, llama3.2, gemma2:2b …
ollama serve                    # starts at http://localhost:11434

If you ran ollama pull without ollama serve, the server is already running in the background — no extra step needed.

3. Configure .env

OPENAI_API_KEY=ollama           # any non-empty string works
OPENAI_BASE_URL=http://localhost:11434/v1
MODEL_ID=qwen2.5:7b            # must match the name you pulled

Option C — 🌐 GPT4All

GPT4All offers a desktop app with a built-in API server mode.

1. Install GPT4All

Download and install the desktop app from nomic.ai/gpt4all.

2. Download a model & enable the API server

Go to Models → browse and download a model (e.g. Mistral 7B Instruct).
Open Settings → API Server, toggle Enable API Server on.
The server starts at http://localhost:4891/v1.

3. Configure .env

OPENAI_API_KEY=gpt4all          # any non-empty string works
OPENAI_BASE_URL=http://localhost:4891/v1
MODEL_ID=Mistral 7B Instruct   # must match the model name shown in the app

4. Run (same for all options)

python sessions/en/s01_agent_loop.py

Tips for CPU inference

Under 8 GB RAM: use 1.5B–3B models — e.g. Qwen2.5-1.5B-Instruct, Llama-3.2-1B-Instruct.

8 GB–16 GB RAM: use 4-bit quantized 7B–8B models — e.g. Llama-3.1-8B-Instruct (Q4), Mistral-7B-Instruct (Q4).

16 GB+ RAM: standard 7B–13B models work well without extra quantization.

Keep context length at 4096 or lower in your server settings to reduce RAM pressure.

The agents already cap max_tokens at 8096, so small models won't be overwhelmed.

🧩 Dependencies

openai>=1.0.0
python-dotenv>=1.0.0
websockets>=12.0
croniter>=2.0.0
python-telegram-bot>=21.0
httpx>=0.27.0

🔗 Related Projects

learn-claude-code -- A companion teaching repo that builds an agent framework (nano Claude Code) from scratch in 12 progressive sessions. Where learn-harness-engineering-by-building-a-mini-openclaw focuses on gateway routing, channels, and proactive behavior, learn-claude-code dives deep into the agent's internal design: structured planning (TodoManager + nag), context compression (3-layer compact), file-based task persistence with dependency graphs, team coordination (JSONL mailboxes, shutdown/plan-approval FSM), autonomous self-organization, and git worktree isolation for parallel execution. If you want to understand how a production-grade unit agent works inside, start there.

👥 About

Scan with Wechat to fellow us,

or fellow on X: shareAI-Lab

If you found this helpful, let me know by leaving a 👍 or a comment!, or if you think this post could help someone, feel free to share it! Thank you very much! 😃

Teaching Small Language Models to Remember: Giving LLMs a Notebook with Differentiable Neural Computers

Asish Kumar Dalal — Sat, 25 Apr 2026 08:31:37 +0000

"Large models memorize the world in their weights. Small models need a notepad."

The Problem: Small Models Forget Facts

Large Language Models (LLMs) like GPT-4 are remarkably good at recalling facts — "Delhi is the capital of India," "Einstein developed the theory of relativity" — because they have billions of parameters acting as a massive, compressed knowledge store. The model bakes facts into weights during pre-training, and retrieval is implicit in the forward pass.

But what happens when you shrink the model?

Small Language Models (SLMs) — the kind you can actually run on a laptop or edge device — have far fewer parameters. There simply isn't enough capacity to reliably encode factual associations. They can handle grammar, style, and short-range reasoning reasonably well, but ask them a factual question and they hallucinate, hedge, or go blank.

The parametric memory paradigm breaks down at small scale.

The Insight

Humans don't store all their knowledge in their neurons alone. We use external memory — notebooks, calendars, books, sticky notes. We offload facts to the environment and look them up when needed. The neural machinery handles reasoning; the notepad handles retrieval.

What if we gave a small language model an explicit, learnable notepad?

That's precisely what a Differentiable Neural Computer (DNC) does.

What Is a DNC?

A Differentiable Neural Computer, introduced by DeepMind in 2016, augments a neural network controller with an external memory matrix — a structured, differentiable store that the network can read from and write to via learned attention mechanisms.

Think of it as RAM for a neural network.

Memory Matrix  M  ∈  ℝ^(N × W)
                   │
          N = number of memory slots (rows)
          W = width of each slot (columns)

The controller (in our case, a small GPT-2) interacts with this memory through soft, differentiable read and write heads — so the whole system is end-to-end trainable with backpropagation.

Unlike a hash map or database, the DNC doesn't look up memory by exact key. It uses content-based addressing — cosine similarity between a query key and stored vectors — blended with usage-based allocation to decide where to write new information.

Architecture: GPT-2 + DNC Memory

The full model layers two components:

                    ┌─────────────────────────────┐
                    │       GPT-2 Backbone        │
                    │  (Masked Self-Attention +   │
                    │   Feed-Forward Layers)      │
                    └──────────────┬──────────────┘
                                   │  hidden state h_t  (B, D)
                                   ▼
                    ┌─────────────────────────────┐
                    │        DNC Memory           │
                    │  ┌─────────────────────┐    │
                    │  │  M ∈ ℝ^(N × W)      │    │  ← external RAM
                    │  └─────────────────────┘    │
                    │   write → read → update     │
                    └──────────────┬──────────────┘
                                   │  read_vec  (B, R*W)
                                   ▼
                         read_proj → h_t + read_vec
                                   │
                                   ▼
                              LM Head → logits

At each time step t, the GPT-2 hidden state h_t is used to:

Write new information into memory
Read relevant information back out
Fuse the read vector with h_t before projecting to vocabulary logits

The memory persists across time steps within a sequence, making it a form of working memory — information written at step 3 can be retrieved at step 47.

The Memory Module: Read & Write Mechanics

Memory State

The memory at any step is a matrix M ∈ ℝ^(B × N × W) — a batch of N slots, each a W-dimensional vector. A usage vector u ∈ ℝ^(B × N) tracks how much each slot has been written to.

Projections from Hidden State

Given the controller hidden state h_t ∈ ℝ^(B × D), the memory module computes:

Projection	Shape	Purpose
`write_key`	`(B, W)`	Where to write (content addressing)
`write_vec`	`(B, W)`	What to write
`erase_vec`	`(B, W)`	What to erase before writing (sigmoid-gated)
`write_gate`	`(B, 1)`	How much to write (0 = skip, 1 = full write)
`read_keys`	`(B, R, W)`	Where to read from (`R` read heads)

Write Weighting

The write address w_write ∈ ℝ^(B × N) is a soft attention distribution over slots:

w_content = softmax( cosine(write_key, M) × τ )
w_alloc   = softmax( (1 − u) × τ )

w_write   = 0.5 × w_content + 0.5 × w_alloc

Content addressing (w_content): write near slots whose content resembles the current write key — useful for updating existing facts.
Allocation (w_alloc): prefer less-used slots — useful for storing new facts without overwriting old ones.

τ is a learned temperature parameter that sharpens or softens the distribution.

Write Operation

M_new = M × (1 − w_write ⊗ erase_vec) + w_write ⊗ write_vec

M_out = M + write_gate × (M_new − M)

The write_gate is the key knob:

write_gate ≈ 0  →  memory unchanged  (model relies on parametric knowledge)
write_gate ≈ 1  →  full write        (model externalizes knowledge)

This gate is learned entirely from data. The model discovers when it's worth writing.

Read Operation

w_read  = softmax( read_keys · M^T × τ )   ∈ ℝ^(B × R × N)
read_vec = w_read · M                       ∈ ℝ^(B × R × W)
         → reshape to (B, R*W)
         → projected back to (B, D) via read_proj

R read heads allow the model to simultaneously query R different "topics" from memory.

State Update

Usage is updated after each write so the allocator tracks which slots are "full":

usage_new = usage + (1 - usage) * w_write.detach()

The .detach() prevents gradients from flowing back through the usage signal — it's a bookkeeping variable, not a learned one.

The Write Gate: Knowing When to Remember

The write gate is the most interpretable component of the whole system. After training, you can run inspect_writes() and visualize per-token gate activations:

Token                  Gate   bar
────────────────────────────────────────────────
Albert                 0.821  ████████████████████████
Einstein               0.904  ███████████████████████████
was                    0.112  ███
born                   0.287  ████████
in                     0.094  ██
1879                   0.756  ██████████████████████
in                     0.071  ██
Ulm                    0.683  ████████████████████
He                     0.143  ████
developed              0.201  ██████
the                    0.058  █
theory                 0.388  ███████████
of                     0.062  █
relativity             0.712  █████████████████████

The model learns to write on content-bearing tokens (proper nouns, dates, key concepts) and skip function words. Nobody taught it this — it emerged from the loss functions.

Loss Functions

Training uses three losses summed together:

1. Language Modelling Loss (Cross-Entropy)

The standard next-token prediction loss:

L_lm = CrossEntropy(logits[:, :-1], input_ids[:, 1:])

This is the primary loss. The model must still predict the next token correctly.

2. Routing Loss

This loss asks: when the write gate is high, does memory actually change the prediction?

If the model writes to memory but the output distribution looks identical to the no-memory baseline, that write was pointless. The routing loss penalises this:

kl = KL( softmax(p_no_mem) || softmax(p_mem) ).detach()
L_routing = -(gate * kl).mean()

The KL divergence between the memory model and a frozen no-memory baseline is computed per token. Multiplied by the gate and negated, this loss:

Rewards high gates when memory changes the prediction (high KL)
Punishes high gates when memory doesn't matter (low KL → wasted write)

The .detach() on the KL ensures gradients only flow through the gate, not the no-memory logits.

3. Entropy Loss (Write Sparsity)

A diffuse write weighting — spreading activation uniformly across all N slots — is wasteful. It's like writing one word across every page of your notebook instead of a single page.

The entropy loss encourages sharp, decisive writes:

H = -(w_writes * (w_writes + 1e-8).log()).sum(-1).mean()
L_entropy = H   # minimized during training

Low entropy → sparse write attention → the model commits to specific slots.

Total Loss

L = L_lm + λ_r · L_routing + λ_e · L_entropy

# defaults: λ_r = 0.1,  λ_e = 0.05

The auxiliary losses are kept small relative to L_lm so language modelling remains the primary objective. The routing and entropy terms act as structural regularizers that shape how the memory is used, not just whether the model gets tokens right.

Code Walkthrough

DNCMemory Module

class DNCMemory(nn.Module):
    def __init__(self, mem_slots, mem_width, num_reads, controller_size):
        super().__init__()
        self.N = mem_slots   # number of memory rows
        self.W = mem_width   # width of each row
        self.R = num_reads   # number of read heads

        # All projections from controller hidden state
        self.write_key_proj  = nn.Linear(controller_size, mem_width)
        self.write_vec_proj  = nn.Linear(controller_size, mem_width)
        self.erase_vec_proj  = nn.Linear(controller_size, mem_width)
        self.write_gate_proj = nn.Linear(controller_size, 1)
        self.read_key_proj   = nn.Linear(controller_size, mem_width * num_reads)
        self.temp            = nn.Parameter(torch.ones(1) * 2.0)  # learned sharpness

DNCLLM Forward Pass

The key loop — stepping through time and interleaving memory reads/writes with transformer hidden states:

def forward(self, input_ids, memory, usage):
    # Run all tokens through GPT-2 in parallel (causal masking handles ordering)
    hidden_states = self.transformer(input_ids).last_hidden_state  # (B, T, D)

    all_logits, all_gates, all_ww = [], [], []

    for t in range(input_ids.size(1)):
        h_t = hidden_states[:, t, :]                     # (B, D) — current hidden state

        # Memory interaction for this timestep
        read_vec, memory, usage, write_gate, w_write = self.memory(h_t, memory, usage)

        # Fuse read vector back into hidden state
        h_out = h_t + self.read_proj(read_vec)           # residual addition

        all_logits.append(self.lm_head(h_out))           # project to vocab
        all_gates.append(write_gate)
        all_ww.append(w_write)

    logits      = torch.stack(all_logits, dim=1)         # (B, T, V)
    write_gates = torch.stack(all_gates, dim=1)          # (B, T, 1)
    w_writes    = torch.stack(all_ww, dim=1)             # (B, T, N)
    return logits, memory, usage, write_gates, w_writes

Why the sequential loop? Memory has a causal dependency — memory[t] depends on what was written at steps 0..t-1. This can't be parallelized like self-attention. It's the main compute overhead of DNC over a pure transformer.

Memory Initialization

Memory is initialized to zeros at the start of each sequence:

def init_memory(self, batch_size, device):
    memory = torch.zeros(batch_size, self.cfg.mem_slots, self.cfg.mem_width, device=device)
    usage  = torch.zeros(batch_size, self.cfg.mem_slots, device=device)
    return memory, usage

This means memory is per-sequence, not persistent across batch items or between training steps. It acts as within-sequence working memory, not a cross-sequence knowledge base.

Metrics to Watch

During training, several metrics beyond loss reveal whether the memory system is working correctly:

Metric	What It Tells You
`avg_gate`	Mean write gate activation. Should settle between 0.2–0.7; too high = writing everything, too low = never writing
`gate_std`	Gate polarization. High std means the model discriminates — writes on some tokens, skips others
`write_rate`	Fraction of timesteps with gate > 0.7. Tracks how aggressively the model uses memory
`write_sparsity`	How concentrated the write weighting is. High sparsity = sharp slot selection
`mem_kl`	KL divergence between memory and no-memory predictions. Non-zero means memory is changing outputs

A healthy DNC should show high gate_std (selective writing) and high write_sparsity (concentrated writes), with non-trivial mem_kl (memory actually matters).

Practical Configuration

The config used in the experiments:

class Config:
    # GPT-2 backbone
    hidden_size = 768
    num_layers  = 6
    num_heads   = 8
    seq_len     = 128

    # DNC memory
    mem_slots   = 64     # N: number of memory slots
    mem_width   = 128    # W: width of each slot
    num_reads   = 4      # R: number of read heads

    # Loss weights
    lambda_routing = 0.1
    lambda_entropy = 0.05

    # Training
    batch_size  = 4
    lr          = 3e-4
    grad_clip   = 1.0

Memory footprint: The external memory adds N × W = 64 × 128 = 8,192 floats per batch item — negligible compared to the model weights themselves. The overhead is in the sequential forward loop, not storage.

Parameter count: DNC adds roughly 5 × (D × W) parameters from the five projection matrices. At D=768, W=128 that's ~490K parameters — about 0.5% overhead on a 6-layer GPT-2.

Limitations and What's Next

This architecture is a proof of concept. Several known limitations:

Sequential bottleneck: The time-step loop cannot be parallelized. For long sequences, this significantly slows training relative to the pure-transformer baseline.

No cross-sequence persistence: Memory resets between sequences. A truly useful factual memory would persist across the lifetime of the model — closer to a retrieval-augmented generation (RAG) system.

Gradient flow through time: Backpropagating through T sequential memory steps can cause vanishing/exploding gradients for long sequences. Gradient clipping (grad_clip = 1.0) helps but doesn't solve it.

Potential extensions:

Persistent memory: Keep a global memory matrix that accumulates knowledge across a training corpus and is frozen at inference time (like a learned knowledge base)
Sparse attention writes: Replace soft write weighting with a top-k hard selection to reduce memory write diffusion
Layer-wise memory: Attach a memory module to each transformer layer, not just the final hidden state
Memory-augmented RAG: Use DNC writes as an online summary buffer, and retrieve from it alongside a static vector DB

Summary

	GPT-2 Baseline	GPT-2 + DNC
Factual recall	Parametric only	Parametric + external memory
Memory type	Weights (static)	N×W matrix (dynamic, per-sequence)
Write mechanism	None	Content + allocation addressing
Selective writing	No	Yes (learned write gate)
Extra parameters	—	~490K (~0.5%)
Training overhead	—	Sequential loop over T steps

The DNC doesn't replace the transformer's parametric knowledge — it supplements it. The model learns when to trust its weights and when to externalise a fact to the notepad. On a small model operating in a domain with many precise facts, that notepad can make all the difference.

The write gate is the centrepiece of the design. When it fires on "Einstein" and "1879" and stays quiet on "was" and "the", you know the model has learned something non-trivial: not all tokens are worth remembering.

Github Code: https://github.com/AsishKumarDalal/memoryllm

Implementation: PyTorch. Dataset: WikiText-2. Backbone: GPT-2 (6 layers, 768 hidden, 8 heads). DNC config: N=64, W=128, R=4 read heads. Loss: L_lm + 0.1·L_routing + 0.05·L_entropy.

IPC Pipe vs Unix Socket for a Resident Daemon in Tauri — What I Learned

hiyoyo — Sat, 25 Apr 2026 08:28:46 +0000

All tests run on an 8-year-old MacBook Air.

When I built Ghost Engine — a resident Swift daemon that handles PDF rendering — I had to decide how Rust talks to it.

Two options: stdin/stdout IPC pipe, or a Unix domain socket.

I tried both. Here's what actually happened.

Option 1: stdin/stdout pipe

Simple. Spawn the process with Stdio::piped(), write commands to stdin, read responses from stdout.

let child = Command::new("ghost-engine-daemon")
    .stdin(Stdio::piped())
    .stdout(Stdio::piped())
    .spawn()?;

// Send command
writeln!(child.stdin.as_mut().unwrap(), "render:page:3")?;

// Read response
let mut response = String::new();
BufReader::new(child.stdout.as_mut().unwrap())
    .read_line(&mut response)?;

Pros: Zero setup. No port conflicts. No socket file cleanup.
Cons: Strictly request-response. One command at a time per pipe pair. No multiplexing.

Option 2: Unix domain socket

More flexible. The daemon listens on a socket file, Rust connects as a client.

use std::os::unix::net::UnixStream;

let mut stream = UnixStream::connect("/tmp/ghost-engine.sock")?;
stream.write_all(b"render:page:3\n")?;

let mut response = String::new();
BufReader::new(&stream).read_line(&mut response)?;

Pros: Multiple concurrent connections. Full duplex. Easier to multiplex requests.
Cons: Need to manage socket file lifecycle. Cleanup on crash requires care.

What I chose and why

For Ghost Engine: stdin/stdout pipe.

My use case is sequential rendering requests from a single Rust process. No concurrent clients, no need for multiplexing. The pipe is simpler, has zero setup overhead, and the daemon lifecycle is tied directly to the parent process — no orphan socket files if the app crashes.

If I needed multiple Tauri windows sending requests simultaneously, I'd switch to Unix socket. For now, pipe is the right fit.

The real lesson

Neither is universally better. Match the IPC mechanism to your concurrency model, not to what sounds more sophisticated.

Hiyoko PDF Vault → https://hiyokoko.gumroad.com/l/HiyokoPDFVault
X → @hiyoyok

Built a local music player for myself after getting annoyed with VLC — runs in your browser

AriaNova613 — Sat, 25 Apr 2026 08:23:32 +0000

Been sitting on a folder of MP3s and WMAs for years with no good way to browse them. VLC works but it's ugly and
clunky. Windows Media Player is basically dead. Spotify doesn't play local files the way I want.

So I built Aria — a local music player that runs as a tiny web server on your machine. Open your browser, your whole
library is there.

It's free, open source, runs fully local, nothing leaves your machine, no account, no subscription.
Landing page: https://arianova613.github.io/aria
GitHub: https://github.com/AriaNova613/aria
Would love any feedback. Still early days.

Why Traditional Autopilot Wipe-and-Reload Fails in Large-Scale Entra ID Migrations

Opsole — Sat, 25 Apr 2026 08:22:10 +0000

Autopilot is often recommended as the standard approach for moving devices to Microsoft Entra ID.

For small environments, wipe-and-reload may work well.

But when organizations need to migrate hundreds or thousands of live user devices, the real challenges begin.

The Problem with Wipe-and-Reload

Traditional migration methods usually involve:

Wiping devices completely
Reimaging systems
Reinstalling applications
Rebuilding user profiles
Reconfiguring VPN, security tools, and access policies

While technically effective, this creates major operational issues in large-scale environments.

Why It Fails at Scale

For enterprise migrations, wipe-and-reimage often leads to:

User productivity loss
High helpdesk ticket volume
Application rework
Profile and personalization loss
Remote user disruption
Compliance and security gaps
Project delays and rollout risks

When managing 500, 2,000, or even 10,000+ devices, these problems multiply quickly.

A Better Migration Approach

Modern Entra ID migrations should focus on preserving the existing user environment instead of rebuilding everything from scratch.

This means:

Keeping user profiles intact
Preserving applications and settings
Maintaining seamless user access
Reducing downtime significantly
Lowering support overhead

This approach improves adoption and makes migration far more practical for enterprise teams.

Final Thoughts

Autopilot works well—until you need to migrate thousands of active devices without disrupting business operations.

A successful Entra ID migration is not just about moving devices.

It is about keeping users productive from day one.