Forem

I Tried to Create GPT With Pure Math and No Training — Here's Where It Broke | Shivnath Tathe

Shivnath Tathe — Fri, 17 Apr 2026 07:01:29 +0000

The Question

What if we skipped training entirely?

Every language model — GPT, LLaMA, BERT — learns by optimising a loss function over millions of gradient steps. But the underlying data is just text: words appearing near other words. Co-occurrence. Counting.

So I asked: how far can pure mathematics take us toward text generation, without a single training step?

I built the whole thing from scratch in Python with NumPy. No PyTorch, no TensorFlow, no model.train(). Just matrices, statistics, and formulas.

Here's what happened.

The Setup: nanoVectorDB

I started with nanoVectorDB — a vector database I'd built from scratch using only NumPy. The original goal was embeddings and similarity search. But then I thought: if I can build word vectors without training, can I also generate text without training?

The corpus: WikiText-103 — 80 million tokens of Wikipedia articles. 10,000 word vocabulary covering 89% of all tokens.

The math pipeline:

Co-occurrence matrix — For each word pair, count how often they appear within a window of 5 words. Forward-heavy weighting (0.7 forward, 0.3 backward) because "the king" tells you more about what follows than what came before.
PPMI (Positive Pointwise Mutual Information) — Raw counts are dominated by common words. PPMI asks: "does this word pair appear together MORE than chance would predict?" It's the formula: PMI(x,y) = log(P(x,y) / P(x)P(y)), clamped to zero for negative values.
SVD (Singular Value Decomposition) — Compress the sparse 10,000×10,000 PPMI matrix into dense 64-dimensional word embeddings. Each word becomes a vector of 64 numbers.
Bigram grammar matrix — Separately, count every word-to-word transition. P(next | last_word). A 10,000×10,000 matrix of raw transition probabilities.

No training. Just counting and matrix factorisation.

What Pure Math Gets Right: Meaning

The embeddings were shockingly good.

Word neighbours:

king  → heir, regent, throne, prince, emperor, pope
queen → princess, duchess, sophia, isabella, catherine
music → indie, pop, hop, jazz, rap, songs, dance
river → lake, creek, valley, upstream, canyon

Analogies (on real data, 80M tokens):

king:man :: queen:? → woman ✓
man:woman :: boy:?  → girl ✓
france:paris :: japan:? → tokyo ✓  
king:queen :: prince:? → princess ✓

5 out of 8 exact matches at rank 1. 7 out of 8 in the top 5.

This isn't a toy result. The SVD embeddings understand that king-queen has the same relationship as prince-princess. They understand that France-Paris maps to Japan-Tokyo. All from counting word co-occurrences in Wikipedia, factorising the matrix, and computing cosine similarities.

This validates Levy & Goldberg (2014) — Word2Vec is implicitly factorising a PMI co-occurrence matrix. We just did it explicitly.

Where It Breaks: Generation

Meaning was solved. Now I tried to generate text. Seed the system with "the king and queen" and predict the next word, then the next, and so on.

Attempt 1: Semantic only (cosine similarity to context)

the king and queen → isabella sophia catherine isabella sophia 
catherine isabella sophia catherine...

Pure synonym loop. The most similar word to "queen" is "isabella". The most similar word to "isabella" is "sophia". Then back to "catherine". Forever.

Attempt 2: Bigram grammar only

Grammar knew that "queen" is often followed by "anne", and "anne" is followed by "elizabeth". But it produced generic Wikipedia filler with no topic awareness.

Attempt 3: Two-stage (the breakthrough)

This was the key idea:

Semantic filter: Find the 20 words most similar to the current context (cosine similarity of SVD embeddings)
Grammar rerank: Among those 20, score each by bigram probability — how often does it actually follow the last word in real text?
Combine: final = 0.7 × grammar + 0.3 × semantic
No repeat: Block every word that's been used before

Semantic proposes. Grammar disposes. No-repeat forces forward motion.

The result:

the war → guerrilla forces fighting troops advancing germans 
italians retreated retreat battle captured ottoman army soldiers 
surrendered marched garrison surrender siege reinforcements 
assault force deployed units corps cavalry division th infantry 
regiment rd battalion nd brigade headquarters unit commanding 
anzac divisional artillery

That's a coherent military narrative — from guerrilla warfare through retreat, surrender, siege, to specific military units and hierarchy. Every transition makes bigram sense. The semantic filter keeps it on-topic. No-repeat pushes it forward.

More outputs from the same system:

the school was built → constructed building construction block 
tower walls towers arches columns carved wooden stone wall arch 
roof tiles marble floors panels decorated brick exterior 
decoration decorative sculptures paintings depicting figures

Architecture → materials → decoration → art. A visual journey through a building.

she won the award → winning medal awarded prize award recipient 
honorary academy graduate school student faculty students 
enrolled

Awards → academia → enrollment. A career trajectory.

The 15 Versions That Followed

The two-stage system generated impressive topic walks but not sentences. So I spent the next 15 versions trying to fix it.

v3.2 — Union pools. Instead of only semantic candidates, I combined semantic top-20 + grammar top-20 into a pool of ~40 candidates. Grammar words like "was", "of", "the" could now compete. Result: grammar words dominated after a few steps. Every seed converged to "...of his own right to be used as well known..." — the same generic Wikipedia filler.

v3.3 — Dual memory. Semantic context tracked only content words (skipping grammar picks). Grammar context used the full sentence. Result: semantic stayed on topic but grammar picks were random glue words. Content and structure weren't coordinated.

v3.4 — Forced alternation (SEM/GRAM/SEM/GRAM). Forced the system to alternate between semantic and grammar picks. Result: the most readable output yet:

the war → victorious IN surrender OF surrendered TO seized BY 
besieged AND captured ON

Grammar words (of, in, by, to, and) appeared as glue between content words. Almost readable — but the grammar words weren't chosen for the content words. "Of" appeared because it has a high bigram score after almost anything, not because the sentence needed it there.

v3.5 — Trigram grammar. Built a trigram dictionary from the corpus (5.9 million unique contexts). Trigrams captured real phrases that bigrams couldn't:

dining → hall
shopping → centre  
tourist → attraction
nobel → peace
honorary → degree

These are genuine multi-word expressions. The bigram only saw "dining → room" or "dining → area". The trigram saw "dining hall" as a unit. But trigram sparsity meant frequent fallback to bigram.

v3.7 — 4-gram. Even sparser, rarely fired, fell back to trigram → bigram. Marginal improvement.

v3.8 — Fuzzy n-grams. The most creative attempt. Instead of exact trigram lookup, find similar contexts via embedding cosine similarity. "emperor empress" could borrow predictions from "king queen" because their embeddings are close. Result: the fuzzy matching was too loose — it matched contexts that sounded similar but had completely different meanings. Pulled in noise.

v3.9 — Union pools + fuzzy trigram. Combined everything. Same gravity-well problem — converged to generic filler after ~8 steps.

v4.0 — Alpha sweep. Tested grammar weights from 0.3 to 0.9 across 10 seeds. Different seeds needed different alpha values. No single alpha worked universally.

v4.1 — MMR soft diversity. Instead of hard-blocking used words, computed max cosine similarity to all previously used word embeddings as a penalty. final = relevance - λ × redundancy. λ=0.4 forced exploration of adjacent semantic regions. "the war" at λ=0.4 traced history across civilizations:

guerrilla forces fighting retreat battle army troops captured 
turkish soldiers surrendered italians germans retreated 
outnumbered defenders withdrew exhausted armies marched siege 
ottoman turks byzantine empire conquered egypt syria lebanon 
palestine israel occupation vietnam cambodia independence

From guerrilla warfare → Ottoman Empire → Byzantine Empire → Egypt/Syria → Israel/Palestine → Vietnam/Cambodia → independence. A walk through centuries of military history, forced by diversity to keep exploring.

The Scorecard

Capability	Toy (213 words)	100k tokens	80M tokens
Similarity separation	0.93	0.21	0.87
Analogies @5	100%	33%	70%
NTP (token accuracy)	41%	2.7%	0%
Generation	Semantic chains	—	Topic walks, no sentences

PPMI + SVD solves meaning. Bigrams solve local transitions. Together they generate coherent topic walks. But they cannot generate grammatical sentences.

Why It Can't Generate Sentences

After 15 versions, the diagnosis is clear. Every fix solved one problem and created another:

What we tried	What it fixed	What it broke
SVD semantic only	Meaning	Loops, no grammar
+ Bigram grammar	Basic transitions	Generic glue chains
+ No repeat	No loops	Exhausts topic words
+ Dual pool	Grammar words appear	Grammar dominates
+ Dual memory	Topic stays alive	Grammar picks random glue
+ Alternation	Content+glue pattern	No coordination
+ Trigram	Real phrases	Sparsity
+ Fuzzy n-gram	Generalisation	Too loose, noise
+ MMR diversity	Explores new regions	Still no sentences

The missing piece is always the same: position-dependent context tracking.

After "the king ruled the", our system needs to know "we need a noun here — specifically an object of 'ruled'." But:

Semantic scoring only knows "what word is RELATED to the recent context" — it doesn't know about syntactic roles.
Grammar scoring only knows "what word commonly FOLLOWS the last word" — P(next | kingdom) doesn't know we're in the object position of "ruled".

A transformer solves this with attention over the full sequence. At position 5, it can look back at position 2 ("ruled") and learn that "ruled the ___" needs a noun object. Our system can only look at the last 1-4 words, and it can't learn positional patterns because there's no learning.

Static embeddings give every word one fixed vector regardless of context. "King" after "the" (needs a verb next) has the same vector as "king" after "became" (needs a determiner). Dynamic, context-dependent representations require attention — and attention requires training.

What This Proves

Levy & Goldberg (2014) proved that Word2Vec implicitly factorises a PMI matrix. Zhao et al. (2025) proved that next-token prediction training converges to SVD factors of co-occurrence structure.

Our experiments confirm both from the other direction: we built the SVD factorisation explicitly and got embeddings that rival Word2Vec quality. But we also proved WHERE that equivalence breaks down — at generation.

Transformers aren't doing something fundamentally different from SVD for meaning. But they add the crucial missing piece: positional, context-dependent reweighting of those factors at every step.

The map of meaning can be built with pure math. The navigator through that map requires learning.

The Stack

Language: Python
Core: NumPy, SciPy (sparse SVD)
GPU acceleration: CuPy (cupyx.scatter_add for co-occurrence matrix building on CUDA)
Data: WikiText-103 via HuggingFace datasets (80M tokens)
Hardware: Kaggle T4 GPU
Training: Zero. None. Not a single gradient step.

What I'd Build Next

This isn't a dead end — it's a foundation. The experiments point to several directions:

Retrieval instead of generation. The embeddings are excellent for finding relevant content. Instead of generating word-by-word, use the SVD vectors to RETRIEVE real sentences from the corpus that match the semantic context. That's what vector databases are actually for.
Hybrid systems. Use the pure-math embeddings as a pre-computed semantic layer, then a small trained model (even a simple RNN) just for the sequential state tracking. The heavy lifting of meaning is already done.
Educational tool. This entire pipeline is transparent — every number is interpretable. No black boxes. Perfect for teaching how language models work from first principles.

Try It Yourself

All you need is NumPy, SciPy, and WikiText-103. Build a
co-occurrence matrix, apply PPMI, run SVD, add a bigram
grammar matrix. Two matrices. Two stages. No training.
Just math.

And now you know exactly where the math stops and the

learning begins.

This research was conducted as an independent exploration. Thanks to Levy & Goldberg, 2014 for the theoretical foundation and to Zhao et al. (2025) for extending the connection to next-token prediction.

I built 3 MCP servers so I can ask Claude about my DevOps stack

Jedsadakorn Suma — Fri, 17 Apr 2026 07:01:04 +0000

Every time something looked off in production, I'd switch between 4 tabs:
Prometheus → check metrics, kubectl → check pods, Grafana → check dashboards, terminal → check logs.

So I built MCP DevOps Pack — 3 MCP servers that let Claude Desktop talk to your infra directly.

## What's included

| Package | What it does |
|---------|-------------|
| @peachjed/mcp-prometheus | PromQL queries, firing alerts, rule inspection |
| @peachjed/mcp-kubernetes | List pods, get logs, describe resources, watch events |
| @peachjed/mcp-grafana | Search dashboards, list datasources, check alert states |

## Install


bash
  npm install -g @peachjed/mcp-prometheus @peachjed/mcp-kubernetes @peachjed/mcp-grafana

  Configure Claude Desktop

  Add to your claude_desktop_config.json:

  {
    "mcpServers": {
      "prometheus": {
        "command": "mcp-prometheus",
        "env": { "PROMETHEUS_URL": "http://localhost:9090" }
      },
      "kubernetes": {
        "command": "mcp-kubernetes"
      },
      "grafana": {
        "command": "mcp-grafana",
        "env": {
          "GRAFANA_URL": "http://localhost:3000",
          "GRAFANA_TOKEN": "your-token"
        }
      }
    }
  }

  What you can ask Claude

  - "What's the current CPU usage across all nodes?"
  - "Show me the last 50 lines from pod api-server-xyz in production"
  - "Are there any firing alerts right now?"
  - "List all dashboards in the Infrastructure folder"

  How it works

  Each server is a small TypeScript process that runs locally via stdio. Claude Desktop spawns it automatically when
  needed. The Kubernetes server uses your existing ~/.kube/config — no extra auth setup.

  Stack

  - TypeScript + @modelcontextprotocol/sdk
  - @kubernetes/client-node for the k8s server
  - Prometheus and Grafana via their HTTP APIs

  Source

  GitHub: https://github.com/Jedsadakorn-Suma/mcp-devops-pack

  npm: @peachjed/mcp-prometheus, @peachjed/mcp-kubernetes, @peachjed/mcp-grafana

  Feedback welcome — especially if you use a different observability stack.

Azure ML Feature Store with Terraform: Managed Feature Materialization for Training and Inference 🗃️

Suhas Mallesh — Fri, 17 Apr 2026 07:00:00 +0000

Azure ML Feature Store is a specialized workspace that manages feature engineering, offline materialization to storage, and online serving with Redis. Terraform provisions the infrastructure, SDK defines feature sets. Here's how to build it.

In the previous posts, we set up the ML workspace and deployed endpoints. Now we need consistent features feeding those endpoints. Training uses historical features from batch sources. Inference needs the latest values in real time. When these diverge, your model's accuracy degrades silently.

Azure ML Feature Store is implemented as a special type of Azure ML workspace (kind = "FeatureStore"). It manages feature transformation pipelines, materializes features to offline storage (ADLS/Blob) and an online store (Redis), and provides point-in-time feature retrieval for training. Terraform provisions the infrastructure; the SDK defines entities, feature sets, and materialization schedules. 🎯

🏗️ Feature Store Architecture

Component	What It Does
Feature Store	Specialized ML workspace with `kind = "FeatureStore"`
Entity	Logical key (e.g., customer_id, account_id) shared across feature sets
Feature Set	Collection of features with transformation code and source definition
Offline Store	ADLS/Blob storage for materialized historical features
Online Store	Redis cache for low-latency inference lookups
Materialization	Spark jobs that compute and sync features on a schedule

The key concept: feature sets include transformation code. Raw data goes in, computed features come out. The same transformation runs for both offline materialization (training) and online materialization (inference), eliminating training-serving skew.

🔧 Terraform: Provision Feature Store Infrastructure

Feature Store Workspace

# feature_store/workspace.tf

resource "azurerm_machine_learning_workspace" "feature_store" {
  name                = "${var.environment}-feature-store"
  location            = azurerm_resource_group.ml.location
  resource_group_name = azurerm_resource_group.ml.name
  application_insights_id = azurerm_application_insights.ml.id
  key_vault_id            = azurerm_key_vault.ml.id
  storage_account_id      = azurerm_storage_account.ml.id

  kind = "FeatureStore"

  identity {
    type = "SystemAssigned"
  }

  tags = var.tags
}

kind = "FeatureStore" is the critical setting. This creates a workspace optimized for feature management rather than general ML development.

Offline Materialization Store

# feature_store/offline_store.tf

resource "azurerm_storage_account" "offline_store" {
  name                     = "${var.environment}fsoffline${random_string.suffix.result}"
  location                 = azurerm_resource_group.ml.location
  resource_group_name      = azurerm_resource_group.ml.name
  account_tier             = "Standard"
  account_replication_type = var.storage_replication
  is_hns_enabled           = true   # ADLS Gen2

  tags = var.tags
}

resource "azurerm_storage_container" "features" {
  name                  = "features"
  storage_account_id    = azurerm_storage_account.offline_store.id
  container_access_type = "private"
}

is_hns_enabled = true enables ADLS Gen2 hierarchical namespace, which is required for efficient feature materialization with Parquet files.

Online Store (Redis Cache)

# feature_store/online_store.tf

resource "azurerm_redis_cache" "online_store" {
  count               = var.enable_online_store ? 1 : 0
  name                = "${var.environment}-fs-redis"
  location            = azurerm_resource_group.ml.location
  resource_group_name = azurerm_resource_group.ml.name
  capacity            = var.redis_capacity
  family              = var.redis_family
  sku_name            = var.redis_sku
  minimum_tls_version = "1.2"

  redis_configuration {
    maxmemory_policy = "allkeys-lru"
  }

  tags = var.tags
}

The online store is optional. Enable it when you need low-latency feature lookups during inference. Skip it in dev if you only need offline features for training.

Compute for Materialization

# feature_store/compute.tf

resource "azurerm_machine_learning_compute_cluster" "materialization" {
  name                          = "${var.environment}-materialization"
  machine_learning_workspace_id = azurerm_machine_learning_workspace.feature_store.id
  location                      = azurerm_resource_group.ml.location
  vm_size                       = var.materialization_vm_size
  vm_priority                   = "LowPriority"

  identity {
    type = "SystemAssigned"
  }

  scale_settings {
    min_node_count                       = 0
    max_node_count                       = var.materialization_max_nodes
    scale_down_nodes_after_idle_duration  = "PT5M"
  }

  tags = var.tags
}

Materialization jobs run as Spark pipelines on this compute cluster. min_node_count = 0 means you pay nothing when no materialization is running.

🐍 Define Entities and Feature Sets (SDK)

Terraform provisions infrastructure. The SDK defines the feature engineering logic:

Create an Entity

from azure.ai.ml import MLClient
from azure.ai.ml.entities import FeatureStoreEntity, DataColumn
from azure.identity import DefaultAzureCredential

fs_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="...",
    resource_group_name="...",
    workspace_name="prod-feature-store",
)

account_entity = FeatureStoreEntity(
    name="account",
    version="1",
    index_columns=[DataColumn(name="accountID", type="string")],
    description="Account entity for transaction features",
)

fs_client.feature_store_entities.begin_create_or_update(account_entity).result()

Entities define shared join keys. Multiple feature sets can reference the same entity, ensuring consistent joins.

Define Feature Set with Transformation Code

Feature set specification (YAML):

# featuresets/transactions/spec/FeaturesetSpec.yaml
$schema: https://azuremlschemas.azureedge.net/latest/featureSetSpec.schema.json

source:
  type: parquet
  path: abfss://data@storage.dfs.core.windows.net/transactions/
  timestamp_column:
    name: timestamp

feature_transformation_code:
  path: ./transformation_code
  transformer_class: transaction_transform.TransactionFeatureTransformer

features:
  - name: transaction_count_7d
    type: integer
  - name: avg_transaction_amount_7d
    type: float
  - name: total_spend_3d
    type: float
  - name: max_transaction_amount
    type: float

index_columns:
  - name: accountID
    type: string

Transformation code (Spark):

# transformation_code/transaction_transform.py
from pyspark.sql import DataFrame
from pyspark.sql import functions as F
from pyspark.sql.window import Window

class TransactionFeatureTransformer:
    def transform(self, raw_data: DataFrame) -> DataFrame:
        window_7d = Window.partitionBy("accountID").orderBy("timestamp").rangeBetween(-7*86400, 0)
        window_3d = Window.partitionBy("accountID").orderBy("timestamp").rangeBetween(-3*86400, 0)

        return raw_data.select(
            "accountID",
            "timestamp",
            F.count("*").over(window_7d).alias("transaction_count_7d"),
            F.avg("amount").over(window_7d).alias("avg_transaction_amount_7d"),
            F.sum("amount").over(window_3d).alias("total_spend_3d"),
            F.max("amount").over(window_7d).alias("max_transaction_amount"),
        )

Register and Materialize

from azure.ai.ml.entities import FeatureSet, FeatureSetSpecification

transaction_fset = FeatureSet(
    name="transactions",
    version="1",
    description="7-day and 3-day rolling transaction aggregations",
    entities=["azureml:account:1"],
    specification=FeatureSetSpecification(
        path="./featuresets/transactions/spec"
    ),
    tags={"data_type": "nonPII"},
)

fs_client.feature_sets.begin_create_or_update(transaction_fset).result()

Configure Materialization Schedule

from azure.ai.ml.entities import (
    MaterializationSettings,
    MaterializationComputeResource,
    RecurrenceTrigger,
)

materialization = MaterializationSettings(
    resource=MaterializationComputeResource(instance_type="Standard_E8s_v3"),
    schedule=RecurrenceTrigger(frequency="Hour", interval=6),
    offline_enabled=True,
    online_enabled=True,
)

fset = fs_client.feature_sets.get(name="transactions", version="1")
fset.materialization_settings = materialization
fs_client.feature_sets.begin_create_or_update(fset).result()

📐 Environment Configuration

# environments/dev.tfvars
environment              = "dev"
enable_online_store      = false        # No Redis in dev
storage_replication      = "LRS"
materialization_vm_size  = "Standard_E4s_v3"
materialization_max_nodes = 2

# environments/prod.tfvars
environment              = "prod"
enable_online_store      = true
redis_sku                = "Standard"
redis_capacity           = 1
redis_family             = "C"
storage_replication      = "GRS"
materialization_vm_size  = "Standard_E8s_v3"
materialization_max_nodes = 8

⚠️ Gotchas and Tips

Feature store is a workspace. It's implemented as kind = "FeatureStore" on azurerm_machine_learning_workspace. It needs the same dependencies (storage, KV, App Insights) as a regular workspace.

Transformation code runs as Spark. Feature transformations execute on the materialization compute cluster using PySpark. Test your transformations locally with a Spark session before registering.

Entities enforce consistent joins. Define entities once (e.g., "account" with key "accountID") and reuse across feature sets. This prevents mismatched join keys between teams.

Materialization costs. Each scheduled run spins up the compute cluster, runs the Spark job, and writes to storage. LowPriority VMs reduce cost. min_node_count = 0 ensures you pay nothing between runs.

Redis cost for online store. Standard Redis starts at ~$40/month. Premium with replication is ~$200/month. Skip online store in dev unless you're testing real-time inference.

Feature set versioning. Feature sets are versioned. Changing the transformation logic? Create version "2". This maintains backward compatibility for models still using version "1".

⏭️ What's Next

This is Post 3 of the Azure ML Pipelines & MLOps with Terraform series.

Post 1: Azure ML Workspace 🔬
Post 2: Azure ML Online Endpoints 🚀
Post 3: Azure ML Feature Store (you are here) 🗃️
Post 4: Azure ML Pipelines + Azure DevOps

Your features have a home. ADLS for offline training, Redis for online inference, Spark transformations that run the same code for both. No training-serving skew. Versioned feature sets with scheduled materialization, all provisioned with Terraform. 🗃️

Found this helpful? Follow for the full ML Pipelines & MLOps with Terraform series! 💬

Why does PHP need asynchrony?

Edmond — Fri, 17 Apr 2026 07:00:00 +0000

"The most dangerous phrase in the language is 'We've always done it this way.'" — Grace Hopper

PHP is one of the last major languages that still lacks built-in support for concurrent execution at the language level. Python has asyncio, JavaScript is natively built on an event loop, Go has goroutines, Kotlin has coroutines. PHP remains in the "one request — one process" paradigm, even though most real-world applications spend the majority of their time waiting for I/O (IO Bound).

The fragmentation problem

Today, asynchrony in PHP is implemented through extensions: Swoole, AMPHP, ReactPHP. Each creates its own ecosystem with incompatible APIs, its own database drivers, HTTP clients and servers.

This leads to critical problems:

Code duplication — each extension is forced to rewrite drivers for MySQL, PostgreSQL, Redis and other systems
Incompatibility — a library written for Swoole doesn't work with AMPHP, and vice versa
Limitations — extensions cannot make standard PHP functions (file_get_contents, fread, curl_exec) non-blocking, because they don't have access to the core
Barrier to entry — developers need to learn a separate ecosystem instead of using familiar tools

TrueAsync

True async/await, coroutines, and non-blocking I/O for PHP

true-async.github.io

2026 Goldman Sachs Coding Interview Real Questions & Solutions

programhelp-cs — Fri, 17 Apr 2026 06:59:41 +0000

Hi everyone, I recently completed the 2026 Goldman Sachs Coding Interview. The interview mainly focuses on real coding ability, data structure design, and problem-solving under pressure. This article shares the actual questions I encountered along with detailed explanations and Python solutions.

Goldman Sachs interviews are typically LeetCode Medium level, sometimes involving design problems or business-style scenarios. Interviewers pay close attention to communication, edge cases, and code readability.

Problem 1: Transaction Segments

Problem Summary:
Given an array of transaction amounts and an integer k, count how many contiguous subarrays of length exactly k are strictly increasing.

Key Idea:
Use a sliding window or linear scan to check each subarray of size k and verify strict increasing order.

Problem 2: Efficient Tasks

Problem Summary:
Assign modules to 3 servers under constraints, and maximize the minimum value among all assignments.

Key Idea:
This is a classic “maximize the minimum” problem, typically solved using binary search combined with greedy validation or dynamic programming.

Problem 3: Design HashMap

Problem Statement:
Design a HashMap without using built-in hash table libraries. Implement the following operations:

MyHashMap() - initialize the data structure
put(key, value) - insert or update a key-value pair
get(key) - return value or -1 if not found
remove(key) - delete key if exists

Solution Idea

We use chaining to handle collisions. The structure contains a fixed-size bucket array, where each bucket stores key-value pairs.

Hash function: key % bucket_size
Collision handling: list-based chaining
Operations: linear search within each bucket

Python Implementation

class MyHashMap:

    def __init__(self):
        self.bucket_count = 10007
        self.hash_map = [[] for _ in range(self.bucket_count)]

    def _hash(self, key: int) -> int:
        return key % self.bucket_count

    def put(self, key: int, value: int) -> None:
        index = self._hash(key)
        for i, (k, v) in enumerate(self.hash_map[index]):
            if k == key:
                self.hash_map[index][i] = (key, value)
                return
        self.hash_map[index].append((key, value))

    def get(self, key: int) -> int:
        index = self._hash(key)
        for k, v in self.hash_map[index]:
            if k == key:
                return v
        return -1

    def remove(self, key: int) -> None:
        index = self._hash(key)
        for i, (k, v) in enumerate(self.hash_map[index]):
            if k == key:
                del self.hash_map[index][i]
                return

Complexity

Average Time: O(1)
Worst Case: O(n)
Space: O(n)

Interview Tips

Focus on explaining edge cases clearly
Communicate while coding
Expect follow-ups on rehashing and load factor
Practice LeetCode Medium problems (Array, DP, Greedy, Design)

Good luck with your Goldman Sachs interview preparation!

I built a database engine in pure C – here's what I learned

kimhjo — Fri, 17 Apr 2026 06:58:00 +0000

I recently built MiniDB Studio, a lightweight database engine in pure C (C11)
as a learning project. Here's what I ended up building and what surprised me along the way.

What it does

B+ Tree indexing on id and age fields
Hash indexes for fast exact lookups
WAL-style crash recovery with CSV snapshot replay
A native desktop UI built with raylib
A lightweight query optimizer for range scans and ordered traversal

The hardest part

Getting WAL replay to behave correctly after a crash mid-write was the
most painful part. The tricky case is when a write is partially flushed
before the crash — you need to detect the incomplete entry and roll back
to the last clean checkpoint without corrupting the rest of the log.

The architecture

The project is split into two layers:

A reusable pure C storage engine library
A separate UI binary that links against it

Keeping them decoupled meant I could test the engine independently
without the UI getting in the way.

What I'd do differently

If I built this again I'd add a proper buffer pool manager earlier.
Right now reads go straight to disk more often than they should.

Links

Feedback welcome — especially on the storage engine design.

I Made a Free Tool That Roasts Your Website's Health in 20 Seconds

Nicky Christensen — Fri, 17 Apr 2026 06:58:00 +0000

I built a website health scanner that checks for broken assets, SSL issues, missing security headers, and more. Here's what it finds on most sites

"How do I know if my site has issues right now?"

So I built a free scanner that answers that in 20 seconds: getsitewatch.com/scan

Paste any URL. No signup. No email. Just a health report.

I've been running it on random production sites and the findings are... interesting. Here's what keeps showing up.

Broken Assets Are Everywhere

The most common finding. Images returning 404. Scripts that got deleted during a deploy but are still referenced in the HTML. Stylesheets pointing to fonts that moved.

A missing image is cosmetic. A missing JS file can kill your entire page. If your checkout depends on a script that no longer exists, the form never renders — but the server still returns 200 OK.

Found this on roughly 3 out of 10 sites scanned. Most owners had no idea.

SSL Certificates Nobody Is Watching

Certs about to expire. Incomplete chains where Chrome works fine but Safari throws warnings. Certs that don't match the domain after a migration.

The stat that blew my mind: 88% of companies experienced an outage from an expired cert in the past two years (Keyfactor 2024). Microsoft Teams went down for 3 hours because someone forgot to renew one.

The scanner flags certs expiring within 30 days and checks chain completeness.

Mixed Content Hiding in Plain Sight

HTTPS page loading resources over HTTP. Browsers block this silently — no error, no warning. Images just don't render. Fonts fall back to system defaults.

Super common on WordPress sites post-SSL migration. Hardcoded http:// URLs buried in the database. The site looks fine to the owner because their browser has assets cached. First-time visitors see something different.

Security Headers? What Security Headers?

The most universally failed check. Almost every site I scan is missing at least two critical headers.

Quick reality check — run this against your own site:

curl -sI https://yoursite.com | grep -iE "strict-transport|content-security|x-frame|x-content-type|referrer-policy"

If that comes back mostly empty, you're missing basic protections against XSS, clickjacking, and MITM attacks. Takes 15 minutes to fix. Most sites never do because nobody checks.

The scanner tells you exactly which headers are missing and what each one protects against.

SEO Metadata That's Quietly Broken

Missing Open Graph tags — so your links look bare and unprofessional when shared on LinkedIn, Twitter, or Slack. Malformed JSON-LD that kills your rich snippet potential. Wrong canonical URLs splitting your SEO across duplicate pages.

None of this crashes your site. All of it makes your site invisible.

Missing Sitemaps

Especially common on SPA/Jamstack sites. The build pipeline handles the app, nobody remembers the sitemap. Or it exists but returns 404 because the path changed.

Free SEO you're leaving on the table.

The Pattern

Most sites I've scanned look perfectly fine if you just open them in a browser. They load. They render. Nothing is obviously wrong.

But the scanner finds something on almost every one. Usually it's a combination: a few missing security headers, an asset that 404s, an SSL cert that expires in 3 weeks.

The thing is — none of this shows up in a standard uptime check. The server responds. Status 200. Dashboard says green. Meanwhile the site has 4 issues nobody knows about.

How It Works

The scanner:

Fetches the page and enumerates every subresource (scripts, stylesheets, images)
Validates each asset actually returns a successful response
Checks SSL certificate validity, expiry, and chain completeness
Inspects response headers for security policies
Validates structured data, OG tags, and canonical URLs
Checks for robots.txt and sitemap.xml

Results come back as a prioritized report — critical issues first, info-level findings last. Plain-language explanations, not jargon dumps.

Try It

getsitewatch.com/scan

20 seconds. Free. No signup.

Drop your health rating in the comments — curious what people find on their own sites.

What's Next

The scanner is a one-time checkup. If you want continuous monitoring — so you catch these issues the moment they appear, not weeks later — that's what Sitewatch itself does. Checks from multiple regions, alerts before your users notice.

But the scan alone is worth doing. You might be surprised.

Why We Rebuilt Our Internal Tool from Scratch And What I Learned

GoodWork Labs — Fri, 17 Apr 2026 06:54:26 +0000

At my previous company, we spent three years trying to make Salesforce, Zapier, and a handful of SaaS tools work together as a unified CRM plus operations platform. We had 14 active integrations, two dedicated engineers on "glue work," and a Slack channel called #zapier-is-on-fire.
Eventually, we stopped patching the gaps and built our own internal tool. That experience changed how I think about the build vs buy decision entirely.
This isn't a "custom software is always better" argument. It's an honest breakdown of where off the shelf apps genuinely fail technically and what you're actually signing up for when you choose either path.

The Hidden Cost of "Good Enough"
Off the shelf apps are often marketed on time to value. You can be up and running in a day. That's real. But what vendors don't talk about is the compounding cost of workaround code.
Every integration point between two SaaS tools is a potential failure surface. Webhooks go missing. API rate limits get hit at the worst times. Schema changes on one platform silently break pipelines in another. According to MuleSoft's 2023 Connectivity Benchmark Report, organizations manage an average of 900+ applications, but fewer than 30% are integrated. That fragmentation has a real engineering cost it just doesn't show up on the vendor's pricing page.

Where Off-the-Shelf Apps Break Down Technically
1. The integration layer becomes your responsibility anyway
Most platforms offer APIs, but "has an API" and "integrates well" are very different things. You'll often find:

Inconsistent data models: One tool stores customer IDs as integers, another as UUIDs, a third as compound strings like acct_US_00123. Your ETL layer has to handle all of them.
Eventual consistency problems: If you're syncing data between a CRM, a billing tool, and a support platform, you'll hit race conditions. A customer updates their email in one place — how long before all three systems agree?
Webhook reliability: Most SaaS webhooks have no guaranteed delivery. You need to build your own reconciliation jobs to catch missed events which means you're already writing custom infrastructure.

With a custom app, you own the data model from day one. There's no translation layer. A field is a field.
2. Scalability is governed by the vendor's architecture, not yours
Off-the-shelf tools are built for the median use case. When your usage pattern is anything but median, you'll hit artificial ceilings:

API rate limits that don't scale linearly with your tier (common in tools like HubSpot, Zendesk, and Airtable)
Batch job limits that force nightly syncs instead of real-time processing
Storage caps that turn into surprise upgrade conversations

Custom apps let you make deliberate scaling decisions. You choose between horizontal scaling and vertical scaling based on your actual read/write patterns. You decide when to introduce caching, CDNs, or queue-based architectures and you're not dependent on a vendor roadmap to get there.
3. Security posture is largely out of your hands
Multi tenant SaaS tools are lucrative targets precisely because a single breach can expose data from thousands of customers. As an individual customer, you have no visibility into their internal security practices beyond what's in their SOC 2 report.
More concretely:

You can't enforce custom** field-level encryption** if the vendor doesn't support it.
You often can't restrict data residency (important for GDPR, HIPAA, and other compliance frameworks) unless you're on an enterprise plan.
Audit logs in many tools are shallow they tell you that something changed, not always how or from what context.

For industries like fintech, healthtech, and legal tech, these aren't nice-to-haves. They're requirements. Custom apps let you build compliance in from the start role-based access, full audit trails, field-level encryption, and proper data residency controls.

The Real Decision Framework
Before choosing between custom and off-the-shelf, I'd suggest running through these questions:
1. Is your workflow genuinely standard?
If you're doing straightforward sales CRM, HR onboarding, or basic project management — off-the-shelf tools are probably fine. The workflow is standard because most businesses do it the same way.
2. How many integration points do you need?
Under 3–4 integrations, SaaS tools usually compose reasonably well. Beyond that, you're entering "glue code" territory. At some point, the glue is your product, and you should own it.
3. What's your data sensitivity?
If you're handling PII, financial data, or health records, vendor risk assessment becomes a real engineering and legal concern. Custom apps give you direct control over where data lives and who can touch it.
4. Is your use case on the vendor's roadmap?
This one bites hard. If the feature you need is "coming in Q3," you're now dependent on someone else's sprint cycle. Custom development means you ship what you need, when you need it.

What Custom Development Actually Looks Like
People often imagine "custom app" means a massive multi-year project. It doesn't have to be.
A practical starting point is a strangler fig pattern: keep the off-the-shelf tool running, but start building custom modules around the edges where it fails you. Gradually migrate. You avoid a big-bang rewrite while incrementally reclaiming control.
A typical early investment might look like:

A custom API gateway that normalizes data between your existing tools
A lightweight internal dashboard built on something like Next.js + Postgres that replaces one heavily-customized SaaS view
A background job system (e.g., BullMQ, Temporal, or Sidekiq) that handles the reconciliation logic you'd otherwise leave to flaky webhooks

None of this requires throwing away your existing stack on day one.

The ROI Framing I Actually Believe
Custom apps cost more upfront. That's true. But the ROI conversation changes when you factor in:

- Engineering hours spent on workaround code (often invisible in budgets because it's just "eng time")
- Vendor price increases as you scale (SaaS pricing is often seat-based or usage-based, and it compounds)
- Lost velocity when you can't ship features because they depend on a vendor's API constraints
The companies I've seen get the most value from custom development weren't trying to avoid SaaS tools entirely. They were strategic about where they needed control and built custom exactly there.

Practical Takeaways

Don't rewrite everything. Identify the one or two workflows where the off-the-shelf tool creates the most friction and start there.
Model the full integration cost before you sign a contract. Count the engineering hours required to maintain every API connection.
If compliance is in scope, involve your security and legal teams in the build-vs-buy decision early don't let it become a retrofit.
The strangler fig pattern is your friend for migrations. Incremental is almost always better than big bang.

Email Drafter — Multi-Agent Email Writing with Google ADK and Cloud Run

Zoe Lin — Fri, 17 Apr 2026 06:51:30 +0000

What I Built

I built Email Drafter, a small web app that turns structured user input into a polished email draft using a multi-agent workflow.

The app takes:

recipient type
purpose
tone
language
key points

Instead of using one giant prompt, I split the workflow into multiple specialized roles.

The system works like this:

one agent plans the email
one agent reviews the plan
one agent writes the final draft
one orchestrator coordinates the workflow

I also deployed the system as separate services on Cloud Run, so the app feels much closer to a real distributed AI workflow instead of just a local prototype.

I picked email drafting because it is a practical use case where planning, reviewing, and writing feel like naturally separate tasks. That made it a good fit for experimenting with multi-agent orchestration.

Cloud Run Embed

Your Agents

This project uses three specialized agents plus one orchestrator.

Researcher Agent

Creates a structured email plan from the user input.
Judge Agent

Reviews the plan and checks whether it is complete enough before moving on.
Content Builder Agent

Writes the final email draft from the approved plan.
Orchestrator Agent

Coordinates the workflow between the other agents and returns the final result to the frontend.

The overall flow is:

Frontend -> Orchestrator -> Researcher -> Judge -> Content Builder -> Final Email Draft

Key Learnings

This project helped me understand why multi-agent systems can be useful even for a relatively small app.

A few things stood out while building it:

Splitting the workflow made debugging easier. It was much easier to tell whether the issue came from planning, review, or writing than trying to fix one large prompt.
Deployment was harder than the basic prompt logic. Getting multiple services running locally, wiring them together, and then deploying them to Cloud Run took more effort than writing the first version of the agents.
Prompt wording had a big impact on output quality. Small instruction changes made a noticeable difference between getting a planning-style response and getting something that looked like a real email.
Multi-agent design felt more modular and production-oriented. Even for a simple email drafting tool, separating responsibilities made the system easier to reason about and improve.

Demo

Remote jobs in Rust – from a file to NATS in three steps

Marco Mengelkoch — Fri, 17 Apr 2026 06:48:10 +0000

Introduction

Every application eventually needs to offload work to another process. Parse a file, send an email, trigger a report – tasks that shouldn't block your main service and might even run on a different machine.

Most solutions require you to commit to their ecosystem from day one – their queue, their worker format, their retry logic. And if you want to swap the transport later, you're rewriting business logic.

I wanted something simpler: define your jobs once in plain Rust structs, start with a file during development, and switch to a real broker for production – without touching the handler code.

This is how I built a remote job system in Rust using mq-bridge.

Step 1: Create Cargo.toml

Let's run cargo init, cargo add mq-bridge serde tokio tracing tracing-subscriber and some other modifications:

# Cargo.toml
[package]
name = "mq-bridge-jobs-example"
version = "0.1.0"
edition = "2021"

[dependencies]
mq-bridge = "0.2.11"
serde = { version = "1.0.228", features = ["derive"] }
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }
tracing = "0.1.44"
tracing-subscriber = { version = "0.3.23", features = ["env-filter"] }

[[bin]]
name = "worker"
path = "src/bin/worker.rs"

[[bin]]
name = "submit"
path = "src/bin/submit.rs"

We want 2 separate binaries: worker that waits for tasks and submit that sends a single mail, which should be received by our worker.

Step 2: Define your jobs

Before we touch any infrastructure, we define what our jobs look like. Just plain Rust structs:

// src/jobs.rs
use serde::{Serialize, Deserialize};

#[derive(Serialize, Deserialize)]
pub struct SendEmail {
    pub to: String,
    pub subject: String,
}

#[derive(Serialize, Deserialize)]
pub struct GenerateReport {
    pub user_id: u32,
}

In addition, we define strings to identify each struct:

// src/jobs.rs
impl SendEmail {
    pub const KIND: &'static str = "send_email";
}

impl GenerateReport {
    pub const KIND: &'static str = "generate_report";
}

Let's also add a lib.rs file:

// src/lib.rs
pub mod jobs;

Then we register handlers for each job type using mq-bridge TypeHandler:

// src/bin/worker.rs
let jobs = TypeHandler::new()
    .add(SendEmail::KIND, |job: SendEmail| async move {
        // We are not actually sending a mail here - just print a log message
        tracing::info!("Sending email to {}", job.to);
        tokio::time::sleep(Duration::from_millis(100)).await;
        Ok(Handled::Ack)
    })
    .add(GenerateReport::KIND, |job: GenerateReport| async move {
        tracing::info!("Generating report for user {}", job.user_id);
        Ok(Handled::Ack)
    });

Step 3: Start with a file backend

No Docker. No broker. Just a file on disk for our worker.

// src/bin/worker.rs
//...
let route = Route::new(
    Endpoint::new(EndpointType::File(
        FileConfig::new("jobs.jsonl").with_mode(FileConsumerMode::Consume { delete: true }),
    )),
    Endpoint::null(), // No output needed here
).with_handler(jobs);

route.deploy("job_worker").await?;

Together with logging and everything, the complete worker.rs now looks like this:

// src/bin/worker.rs (complete)
use mq_bridge::{
    Handled, Route,
    models::{Endpoint, EndpointType, FileConfig, FileConsumerMode},
    type_handler::TypeHandler,
};
use mq_bridge_jobs_example::jobs::{GenerateReport, SendEmail};
use std::time::Duration;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    tracing_subscriber::fmt()
        .with_env_filter(tracing_subscriber::EnvFilter::new("info"))
        .init();

    let jobs = TypeHandler::new()
        .add(SendEmail::KIND, |job: SendEmail| async move {
            tracing::info!("Sending email to {}", job.to);
            tokio::time::sleep(Duration::from_millis(100)).await;
            Ok(Handled::Ack)
        })
        .add(GenerateReport::KIND, |job: GenerateReport| async move {
            tracing::info!("Generating report for user {}", job.user_id);
            Ok(Handled::Ack)
        });

    let route = Route::new(
        Endpoint::new(EndpointType::File(
            FileConfig::new("jobs.jsonl").with_mode(FileConsumerMode::Consume { delete: true }),
        )),
        Endpoint::null(), // No output needed here
    )
    .with_handler(jobs);

    route.deploy("job_worker").await?;

    tracing::info!("Worker running — press Ctrl-C to exit");
    tokio::signal::ctrl_c().await?;
    tracing::info!("Shutting down");
    Ok(())
}

To submit a job, just append a new line to jobs.jsonl in our submit.rs:

// src/bin/submit.rs
use mq_bridge::{
    Publisher,
    models::{Endpoint, EndpointType, FileConfig},
    msg,
};
use mq_bridge_jobs_example::jobs::SendEmail;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    tracing_subscriber::fmt()
        .with_env_filter(tracing_subscriber::EnvFilter::new("info"))
        .init();

    let publisher = Publisher::new(Endpoint::new(EndpointType::File(FileConfig::new(
        "jobs.jsonl",
    ))))
    .await?;

    publisher
        .send(msg!(
            SendEmail {
                to: "user@example.com".into(),
                subject: "Welcome!".into()
            },
            SendEmail::KIND
        ))
        .await?;
    Ok(())
}

Works completely offline. Great for development and testing.

Now let's test it. Open a first shell:

cargo run --bin worker

The worker is now running and waiting for file modifications. In a second shell, submit a job:

cargo run --bin submit

The worker will receive the task and print:

INFO worker: Sending email to user@example.com

Instead of using the submit binary, you could also just simply push a new line to the file

echo '{"message_id":1,"payload":{"subject":"Welcome!","to":"user@example.com"},"metadata":{"kind":"send_mail"}}' > jobs.jsonl

Afterwards jobs.jsonl is empty — because FileConsumerMode::Consume { delete: true } removes consumed lines. With delete: false, lines would be kept and replayed on the next worker start.

There is alternatively a GroupSubscribe mode to prevent re-deliver by tracking the current offset via separate .offset file, without deleting lines.

Step 4: Switch to JSON config

The business logic stays in Rust. The infrastructure moves to config:

cargo add serde_json

src/bin/config.json

{
  "input": {
    "file": {
      "path": "jobs.jsonl",
      "delete": true,
      "mode": "consume"
    }
  },
  "output": {
    "null": null
  }
}

// src/bin/worker.rs - load route from config
let route: Route = serde_json::from_str(include_str!("config.json"))?;
let route = route.with_handler(jobs);
route.deploy("job_worker").await?;

// src/bin/submit.rs - create a publisher from the same config
let route: Route = serde_json::from_str(include_str!("config.json"))?;
let publisher = Publisher::new(route.input).await?;

You can now load the configuration from a file or database. The code is smaller, and you can change the backend without touching your handler code.

In a later production scenario, you might also want to use a separate publisher configuration. The are properties that are only available for consumers or publishers and there would be a warning when using invalid settings. Also, you might want to configure a specific kafka group_id or use separate topics for fan out.
But for this example, using a common NATS configuration works fine.

Step 5: Switch to NATS for production

To run the worker on a separate machine, you'll want a broker or database. NATS is a great fit — it's lightweight, just a single binary with no dependencies and stores messages.

First, enable the nats feature in Cargo.toml:

mq-bridge = { version = "0.2.11", features = ["nats"] }

This just enables the "nats" feature. We can simply re-run the previous
example. Nothing changes yet, still using file, it just needs longer to compile.

Start NATS with JetStream:

# macOS
brew install nats-server && nats-server -js

# or Ubuntu/Debian
wget https://github.com/nats-io/nats-server/releases/latest/download/nats-server-linux-amd64.deb
sudo apt install ./nats-server-linux-amd64.deb && nats-server -js

# or Docker
docker run -p 4222:4222 nats:2.12.2 -js

One config.json file change, no code changes:

{
  "input": {
    "nats": {
      "url": "nats://localhost:4222",
      "subject": "test-stream.pipeline",
      "stream": "test-stream"
    }
  },
  "output": {
    "null": null
  }
}

Restart worker and submit — both now talk to NATS. The handler code is untouched.

What you get for free

Switching to NATS unlocks everything mq-bridge builds on top. You can add middlewares in the config, for example retries and a dead-letter queue (DLQ) for failed messages:

{
  "nats": {
    "url": "nats://localhost:4222",
    "subject": "test-stream.pipeline",
    "stream": "test-stream"
  },
  "middlewares": [
    {
      "retry": {
        "max_attempts": 3,
        "max_interval_ms": 5000,
        "initial_interval_ms": 100,
        "multiplier": 2
      }
    },
    {
      "dlq": {
        "endpoint": {
          "file": {
            "path": "error.log"
          }
        }
      }
    }
  ]
}

The retry middleware will retry failed deliveries with exponential backoff. If all attempts are exhausted, the dlq middleware writes the message to error.log instead of dropping it silently.

If you are already using MongoDB, MySQL, MariaDB, or PostgreSQL, you can use them as your queue backend as well — just a config change.

If you just want message forwarding from one endpoint to another or an UI to
create different json configs, you can also use
mq-bridge-app (cargo install mq-bridge-app).
The code also shows how you would use mq-bridge as webserver.

If you just need a simple send and receive - this is also available. You may skip the
whole event handler and route concept and just use the same API calls for Http, gRPC, MongoDb, Kafka, RabbitMQ and NATS. They all have the same receive and publish method and use the same message struct CanonicalMessage for transport.

Step 6: Testing with the memory endpoint

Because mq-bridge uses the same trait for all backends, you can test your handlers without any broker or file system — just an in-memory channel.

Let's add a test for submit.rs:


// src/bin/submit.rs
use mq_bridge::{msg, Publisher, Route};
use mq_bridge_jobs_example::jobs::SendEmail;

async fn send_mail(publisher: Publisher) -> Result<(), Box<dyn std::error::Error>> {
    publisher
        .send(msg!(
            SendEmail {
                to: "user@example.com".into(),
                subject: "Welcome!".into()
            },
            SendEmail::KIND
        ))
        .await?;
    Ok(())
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    tracing_subscriber::fmt()
        .with_env_filter(tracing_subscriber::EnvFilter::new("info"))
        .init();

    let route: Route = serde_json::from_str(include_str!("config.json"))?;
    let publisher = Publisher::new(route.input).await?;
    send_mail(publisher).await
}
#[cfg(test)]
mod tests {
    use mq_bridge::endpoints::memory::MemoryConsumer;
    use mq_bridge::traits::MessageConsumer;
    use mq_bridge::Publisher;
    use mq_bridge::models::Endpoint;
    use mq_bridge_jobs_example::jobs::SendEmail;

    use crate::send_mail;

    #[tokio::test]
    async fn test_submit_sends_email_job() {
        let topic = "test-submit";
        let mut consumer = MemoryConsumer::new_local(topic, 10);
        let publisher = Publisher::new(Endpoint::new_memory(topic, 10)).await.unwrap();
        send_mail(publisher).await.unwrap();
        let received = consumer.receive().await.unwrap();
        let payload: serde_json::Value = serde_json::from_slice(&received.message.payload).unwrap();
        assert_eq!(payload["to"], "user@example.com");
        assert_eq!(received.message.metadata["kind"], SendEmail::KIND);
    }
}

No broker, no file, no test containers. The same TypeHandler that runs in production is tested here — only the transport is swapped.

What not to expect

Not all aspects and features of brokers or databases are supported. Some features
are emulated, other features may not be implemented yet. Don't expect a full grown
framework that guides you on how to do stuff or already prevents misconfiguration
during compile time when reading configs during runtime.

Conclusion

mq-bridge covers more than just remote jobs. You can use it for events, or to send and receive messages from existing brokers. And you can scale up by adding Kafka as a buffer or fan-out layer — again, just config.

mq-bridge is still a young library. Don't expect it to be as complete as Watermill (Go) or Java Spring. It uses some of their concepts, but it doesn't try to be the same — event sourcing and aggregate management are out of scope for now, as the focus is on transport. Documentation is still growing, and this tutorial is a first step toward that.

This tutorial is available here:
https://github.com/marcomq/mq-bridge-jobs-example

The mq-bridge library is available here:
https://github.com/marcomq/mq-bridge

Feedback and contributions welcome.

io_uring Adventures: Rust Servers That Love Syscalls

speed engineer — Fri, 17 Apr 2026 06:47:41 +0000

Our Rust file server hit a ceiling at 45K requests/sec. Switching to io_uring multiplied throughput 3.4x and cut latency 68% — but the…

io_uring Adventures: Rust Servers That Love Syscalls

Our Rust file server hit a ceiling at 45K requests/sec. Switching to io_uring multiplied throughput 3.4x and cut latency 68% — but the journey taught us syscalls aren’t the enemy, context switches are.

io_uring revolutionizes I/O by batching system calls like a modern mail sorting system — multiple requests travel together through the kernel in one trip, eliminating the costly back-and-forth that traditional syscall interfaces require for each operation.

We thought our Rust file server was fast. Written with Tokio, leveraging async/await, serving static assets at 45,000 requests per second on modest hardware. The code was clean, the architecture was sound, and the CPU usage sat at a reasonable 60%. We’d reached what felt like the natural limit of network I/O performance.

Then we profiled with perf and discovered something startling: 42% of our CPU time was spent in the kernel, not in our application. System calls for reading files, accepting connections, and sending responses dominated the flame graph. We were context switching between user space and kernel space 180,000 times per second.

The revelation: we weren’t CPU-bound or I/O-bound — we were syscall-bound.

Follow me for more Go/Rust performance insights

Enter io_uring, Linux’s newest I/O interface. The promise was audacious: submit batches of I/O operations without syscalls, get completions without interrupts, and let the kernel process everything asynchronously. It sounded like magic. Three weeks of rewriting later, our throughput hit 152,000 requests per second on the same hardware, and our kernel time dropped to 14% of total CPU usage.

But the real story isn’t the performance win — it’s learning why traditional async I/O fails at scale, and how io_uring fundamentally changes the conversation between application and kernel.

The Syscall Tax Nobody Talks About

System calls look free in casual code. Call read(), get your data, move on. The cost seems negligible for individual operations. But each syscall carries hidden overhead that compounds under load.

Here’s what happens during a traditional file read:

// Traditional async file read in Tokio  
let mut file = File::open("data.txt").await?;  
let mut buffer = vec![0; 4096];  
file.read(&mut buffer).await?;

Under the hood, this triggers:

User → Kernel transition : Save registers, switch stacks, change privilege level (~150 CPU cycles)
Kernel work : Page table lookup, file system logic, security checks
Kernel → User transition : Restore registers, switch back (~150 cycles)

That’s 300+ cycles of pure overhead before any actual I/O happens. At 45,000 requests/second with an average of 4 syscalls per request (accept, read, write, close), we were burning 54 million CPU cycles per second just on context switching.

The syscall bottleneck visualized. Traditional I/O requires crossing the user-kernel boundary for every operation, paying the context switch tax repeatedly. io_uring creates an express lane where operations batch together, crossing the boundary once to submit dozens of requests.

The problem intensifies with concurrent operations. If you’re serving 1,000 concurrent connections, and each one needs to read a file, that’s 1,000 separate syscall sequences. The kernel spends more time managing transitions than doing actual work.

Our profiling revealed the breakdown:

28% of CPU time in syscall entry/exit paths
14% in context switch overhead
18% in actual kernel I/O logic
40% in our application code

We were spending more time entering and exiting the kernel than actually performing I/O operations.

The io_uring Mental Model Shift

Traditional async I/O (epoll, select, kqueue) treats the kernel as a service you call for each operation. io_uring inverts this: the kernel and your application share two ring buffers and work collaboratively.

The Submission Queue (SQ) : Your application prepares I/O operations as entries in this ring buffer. Each entry describes what you want: read this file, write that socket, accept new connections. You queue multiple operations, then notify the kernel once.

The Completion Queue (CQ) : The kernel writes results here. When operations complete, entries appear in this ring. Your application polls for completions in batches.

The magic: zero-copy, lockless communication between user space and kernel space. No system calls for submitting work, no interrupts for receiving results. Just shared memory and memory barriers.

Here’s how it looks in Rust using the tokio-uring crate:

use tokio_uring::fs::File;  

// io_uring-based file read  
let file = File::open("data.txt").await?;  
let buf = vec![0u8; 4096];  
let (res, buf) = file.read_at(buf, 0).await;  
let bytes_read = res?;

On the surface, it looks similar. The difference is invisible but profound. That read_at operation queues an entry to the submission queue. The kernel picks it up, performs the read, and places the result in the completion queue. Your application continues working until it explicitly checks for completions.

The Rewrite: From Tokio to tokio-uring

Our existing server was built on Tokio’s standard runtime. It used async/await syntax but relied on epoll underneath — meaning every I/O operation hit the kernel individually. Converting to io_uring required rethinking our architecture.

The old request handler:

 async fn handle_request(stream: TcpStream) -> Result<()> {  
    let mut file = File::open(&request.path).await?;  
    let mut buffer = Vec::with_capacity(8192);  
    file.read_to_end(&mut buffer).await?;  

    stream.write_all(&response_headers).await?;  
    stream.write_all(&buffer).await?;  
    Ok(())  
}

This generates five distinct syscalls: open, read, write (headers), write (body), close. Each one crosses the user-kernel boundary.

The io_uring version:

 async fn handle_request_uring(stream: TcpStream) -> Result<()> {  
    let file = File::open(&request.path).await?;  
    let buf = vec![0u8; 8192];  

    // Queue the read operation  
    let (res, buf) = file.read_at(buf, 0).await;  
    let bytes_read = res?;  

    // Queue the write operations  
    let (res1, _) = stream.write(response_headers).await;  
    let (res2, _) = stream.write(&buf[..bytes_read]).await;  

    res1?;  
    res2?;  
    Ok(())  
}

The code looks nearly identical, but io_uring batches operations internally. When we await, tokio-uring checks if multiple operations can be submitted together. In practice, we were submitting 8–12 operations per actual syscall.

The conversion took three weeks because tokio-uring has different semantics:

Ownership: io_uring operations take ownership of buffers and return them on completion
Fallback: Not all operations support io_uring yet, requiring hybrid approaches
Tuning: Ring buffer sizes and polling strategies needed optimization

The Numbers That Justified Everything

We ran comprehensive benchmarks comparing three implementations: standard Tokio, a custom epoll implementation, and our new io_uring server. All tests used the same hardware (32-core AMD EPYC, 128GB RAM, NVMe storage) and workload (mixed file sizes from 4KB to 1MB).

Baseline (Standard Tokio):

Throughput: 45,200 requests/second
Latency P50: 1.8ms
Latency P99: 12.4ms
CPU usage: 61% (42% kernel, 19% user)
Context switches: 181,000/sec

Custom epoll (Our optimization attempt):

Throughput: 52,100 requests/second
Latency P50: 1.6ms
Latency P99: 11.1ms
CPU usage: 58% (39% kernel, 19% user)
Context switches: 162,000/sec

io_uring (Final implementation):

Throughput: 152,800 requests/second
Latency P50: 0.58ms (68% reduction)
Latency P99: 3.2ms (74% reduction)
CPU usage: 66% (14% kernel, 52% user)
Context switches: 31,000/sec (83% reduction)

The CPU usage paradox confused us initially. We were handling 3.4x more traffic but using only slightly more total CPU. The answer: we shifted CPU usage from the kernel to our application. With fewer context switches, the CPU spent more time executing our code and less time managing transitions.

The latency improvements were equally dramatic. Our P99 latency dropped from 12.4ms to 3.2ms. Tail latency matters because it represents your worst user experiences. io_uring’s batching smoothed out the latency distribution by eliminating periodic syscall storms.

The Gotchas That Bit Us Hard

io_uring’s performance comes with sharp edges. Here are the painful lessons we learned:

Gotcha 1: Ring buffer sizing is critical

We started with the default 128-entry rings. Under load, the submission queue would fill, forcing synchronous syscalls. We monitored queue depths and discovered we needed 2048-entry rings to handle burst traffic without falling back to syscalls.

let ring = IoUring::builder()  
    .setup_sqe_count(2048)  // Submission queue entries  
    .setup_cqe_count(4096)  // Completion queue entries  
    .build()?;

Too small: You fall back to syscalls under load, losing performance.

Too large: You waste memory and cache space.

Our rule: Size rings to handle 2x your peak concurrent operations.

Gotcha 2: Buffer ownership is non-negotiable

Traditional async Rust lets you reference borrowed data. io_uring requires owned buffers because operations complete asynchronously:

// This won't compile with io_uring  
let buf = [0u8; 4096];  
file.read(&buf).await?; // ERROR: buf might outlive operation  

// io_uring requires ownership  
let buf = vec![0u8; 4096];  
let (result, buf) = file.read(buf).await; // buf moved and returned

This forced us to rethink our buffer management. We ended up implementing buffer pools (shoutout to our previous sync.Pool work) to avoid allocating on every operation.

Gotcha 3: Kernel version matters — a lot

io_uring landed in Linux 5.1 but matured significantly through 5.10+. We discovered hard-to-debug issues running on 5.4 kernels. Features like buffer registration and advanced operation chaining didn’t work reliably until 5.10.

Production lesson: Require Linux 5.10+ for io_uring deployments. The performance difference between kernel versions can be 40% or more.

Gotcha 4: Error handling becomes distributed

With traditional I/O, errors happen at the call site. With io_uring, errors appear in the completion queue:

// Error might not surface until completion  
let (result, buf) = file.read_at(buf, offset).await;  
match result {  
    Ok(n) => { /* success */ },  
    Err(e) if e.kind() == ErrorKind::NotFound => { /* handle */ },  
    Err(e) => { /* other error */ }  
}

This temporal disconnect between submission and error made debugging more complex. We added detailed tracing to correlate submissions with completions.

The Hybrid Strategy That Actually Works

Pure io_uring isn’t always practical. Some operations aren’t supported, some libraries don’t integrate, and some platforms don’t have modern kernels. We developed a hybrid approach:

Use io_uring for:

File I/O operations (read, write, stat)
Network socket operations (accept, send, receive)
High-throughput, latency-sensitive paths

Fall back to standard async for:

DNS resolution (not io_uring-friendly)
Cryptographic operations (CPU-bound anyway)
Third-party library integration
Development/testing environments

Our production architecture uses feature detection:

fn create_runtime() -> Runtime {  
    if io_uring_supported() && kernel_version() >= (5, 10) {  
        tokio_uring::Runtime::new()  
    } else {  
        tokio::runtime::Runtime::new()  
    }  
}

This gives us bleeding-edge performance on modern infrastructure while maintaining compatibility with older deployments. In production, 93% of our servers run io_uring, with the remainder on standard async.

Advanced Patterns: Buffer Registration and Linked Operations

After mastering basics, we explored advanced io_uring features that multiplied our gains.

Buffer Registration : Pre-register buffers with the kernel to eliminate validation overhead:

let buffers: Vec<Vec<u8>> = (0..1024)  
    .map(|_| vec![0u8; 4096])  
    .collect();  

ring.register_buffers(&buffers)?;  

// Now use registered buffers for I/O  
// Kernel skips permission checks since buffers are pre-validated

This shaved another 8% off our latency by eliminating per-operation buffer validation. The kernel knows these buffers are safe because we registered them upfront.

Linked Operations : Chain operations so they execute atomically:

// Open file, read contents, close file—all as one atomic chain  
let open_op = OpCode::OpenAt { /* ... */ };  
let read_op = OpCode::Read { /* ... */ }.flags(IOSQE_IO_LINK);  
let close_op = OpCode::Close { /* ... */ };  

// If any operation fails, subsequent ones in the chain don't execute

This prevented resource leaks in error paths. If opening a file fails, the read and close operations automatically cancel.

Linked operations in io_uring transform error-prone sequential I/O into atomic chains. If any operation fails, the entire chain cancels gracefully. This eliminates complex error recovery logic and prevents resource leaks that plague traditional async I/O code.

The Decision Framework: When io_uring Makes Sense

io_uring isn’t a universal solution. Here’s when it pays off:

Use io_uring when:

You’re doing high-volume I/O operations (>10K ops/second)
Latency matters (P99 under 5ms goals)
You control the deployment environment (Linux 5.10+)
Your workload is I/O-bound, not CPU-bound
You need predictable tail latency under load

Skip io_uring when:

Your I/O volume is modest (<1K ops/second)
You need wide platform compatibility (macOS, Windows, old Linux)
Your application is CPU-bound (crypto, compression, encoding)
Development velocity matters more than raw performance
You can’t guarantee modern kernel versions

Our rule: Profile first. If syscall overhead shows up in your top 5 bottlenecks, io_uring is worth exploring. If not, you’re optimizing the wrong thing.

The Rust Ecosystem: Maturity and Gaps

The Rust io_uring ecosystem is maturing rapidly but has gaps:

tokio-uring : The most mature option, but diverges from standard Tokio APIs. Migration requires careful refactoring. Great for green-field projects, painful for existing codebases.

io-uring crate : Lower-level bindings, maximum flexibility. We used this for our custom file server but found it too low-level for typical application development.

glommio : A complete async runtime built on io_uring from the ground up. Beautiful design but incompatible with the Tokio ecosystem, forcing an all-or-nothing migration.

We chose tokio-uring for its balance of performance and compatibility. The API differences from Tokio were manageable, and we could migrate incrementally.

Performance Insights: Where the Gains Actually Come From

Breaking down our 3.4x throughput improvement:

40% from reduced context switches : Fewer kernel transitions freed CPU
25% from batched operations : Multiple I/O ops per syscall
20% from improved cache behavior : Sequential operations in rings
15% from eliminated buffer copying : Shared memory removes copies

The context switch reduction was the dominant factor. Going from 181,000 to 31,000 context switches per second freed enormous CPU resources. Each context switch costs roughly 1–2 microseconds when you include cache pollution — we saved 150 milliseconds of CPU time per second.

The batching effect amplified this. Instead of 4 syscalls per request (accept, read, write, close), we averaged 0.5 syscalls per request. Operations queued in the submission ring and were submitted in batches of 8–12.

The Monitoring Story: Observability Matters More

With io_uring, traditional metrics become less useful. Response time is easy to measure, but understanding why it changed requires new telemetry:

struct IoUringMetrics {  
    submissions_per_syscall: Histogram,  
    submission_queue_depth: Gauge,  
    completion_queue_depth: Gauge,  
    buffer_pool_hits: Counter,  
    fallback_operations: Counter,  
}

We discovered our completion queue depth periodically spiked to 80% capacity, indicating we weren’t reaping completions fast enough. Tuning our event loop polling frequency resolved this.

The buffer pool metrics revealed surprising patterns. Files under 16KB got cached in our pool, but larger files bypassed it. This drove our decision to implement multi-tier pooling (small/medium/large buffers).

The Future: What’s Next for io_uring

io_uring development continues rapidly. Features we’re excited about:

Registered file descriptors : Pre-register files with the kernel for even lower overhead. Like buffer registration but for file handles.

Asynchronous stat operations : Currently, stat() calls still block. Future kernels will support async stat through io_uring.

Direct I/O improvements : Better integration with O_DIRECT for database-style workloads.

Cross-platform efforts : io_uring is Linux-only, but the concepts are influencing other platforms. Windows’ I/O rings and FreeBSD’s experimental implementations show the idea is spreading.

For our team, the next frontier is database queries. Postgres and MySQL don’t yet expose io_uring interfaces directly, but we’re exploring proxy architectures that could batch database I/O operations.

Enjoyed the read? Let’s stay connected!

🚀 Follow The Speed Engineer for more Rust, Go and high-performance engineering stories.
💡 Like this article? Follow for daily speed-engineering benchmarks and tactics.
⚡ Stay ahead in Rust and Go — follow for a fresh article every morning & night.

Your support means the world and helps me create more content you’ll love. ❤️

Why Agentic AI is Killing the Traditional Database

Karan Kumar — Fri, 17 Apr 2026 06:42:15 +0000

Your AI agent just wrote a new feature, generated 10 different schema variations to test performance, and deployed 50 ephemeral micro-services—all in under three minutes. Now, it needs a database for every single one of them.

If you're relying on a traditional RDS instance, you're staring at a massive bill for idle compute and a manual migration nightmare. The rise of agentic software development is forcing a total rewrite of the database layer. Here is why.

The Challenge: The "Evolutionary" Bottleneck

For decades, we've treated databases as static, monolithic anchors. We carefully planned schemas, ran migrations with a sense of dread, and provisioned "T-shirt sizes" of compute based on peak load. This worked because human engineers are slow; we write code in hours and deploy in days.

AI agents change the math. We are shifting from handcrafted software to evolutionary software. An agent doesn't just write one version of a feature; it iterates through a vast search space of possible implementations. It branches the code, tests a hypothesis, fails, and pivots—all in seconds.

When your software development lifecycle (SDLC) accelerates by 100x, the database becomes the primary bottleneck. You cannot git checkout -b a 1TB production database. Nor can you justify a $100/month baseline cost for a prototype that an agent will discard in 10 seconds.

We are seeing a paradigm shift where agents are creating four times as many databases as humans. The infrastructure isn't just scaling; it's mutating.

The Architecture: The Third-Generation Database

To survive this shift, we need a fundamental architectural change: the total separation of storage and compute, combined with metadata-level branching. This is the core philosophy behind "Lakebase" architectures.

Instead of a database being a server that holds data, the database becomes a stateless compute layer that sits atop a shared, open storage lake.

Core Components: Solving the Three Big Problems

To make this viable, the architecture must solve for branching, cost, and compatibility.

1. $O (1)$ Metadata Branching

Traditional cloning requires physical data copying. If you have 1TB of data, a clone takes hours. In an agentic world, that is a non-starter.

Modern architectures utilize Copy-on-Write (CoW) at the metadata layer. When an agent creates a branch, the system doesn't copy the data; it creates a new pointer to the existing data blocks. A new version is written only when the agent modifies a block. This transforms branching into an $O (1)$ operation. You can maintain 500 nested branches of a database with nearly zero storage overhead.

2. Scale-to-Zero Elasticity

If an agent spins up a database for a 10-second test, paying for an hourly instance is a financial disaster. We need "Serverless SQL" where the compute layer is completely decoupled.

When no queries are hitting the endpoint, the compute instance is terminated. When a request arrives, the controller spins up a lightweight execution engine in sub-second time, attaches it to the storage lake, and executes the query. This eliminates the "cost floor," making the marginal cost of an experiment effectively zero.

3. The "Openness" Requirement

LLMs aren't trained on proprietary, closed-source database internals; they are trained on Postgres, MySQL, and SQLite. If you use a proprietary API, the agent will hallucinate.

By using open formats (such as Postgres page formats) directly on cloud object storage, we ensure that agents can interact with data using the patterns they already know. Openness is no longer a philosophical choice—it is a performance requirement for AI reliability.

The Data & Workflow Loop

How does this look in a production pipeline? Let's trace a single agentic iteration.

Trade-offs & Scalability

No architecture is without trade-offs. Moving to a decoupled, agent-centric model introduces new challenges.

Latency vs. Throughput:
In a traditional monolithic DB, data resides on local NVMe drives. In a Lakebase architecture, data lives in S3, introducing network latency. To mitigate this, we implement aggressive local caching of "hot" pages on the compute node. You trade a few milliseconds of first-byte latency for the ability to spin up 1,000 databases instantly.

Consistency Models:
With hundreds of branches evolving simultaneously, managing the "source of truth" becomes complex. The system must handle merging database states similarly to how Git handles code merges—resolving conflicts in the metadata layer before committing a branch back to production.

The Scaling Curve:
Because the compute is stateless, scaling is linear. If your agent-generated app suddenly goes viral, you don't migrate to a larger box; you simply increase the number of compute nodes pointing at the same object store.

Key Takeaways

Software is becoming evolutionary. AI agents iterate too quickly for traditional "provisioned" databases.
Branching must be $O (1)$ . Physical data copying is the enemy; metadata Copy-on-Write is the solution.
Scale-to-Zero is mandatory. The economic model of AI development requires the removal of the monthly cost floor.
Open standards ensure AI compatibility. Proprietary formats lead to agent hallucinations and operational friction.
Decoupling is the only path forward. Separating compute from storage is the only way to achieve the elasticity required by agentic workflows.

Forem

I Tried to Create GPT With Pure Math and No Training — Here's Where It Broke | Shivnath Tathe

The Question

The Setup: nanoVectorDB

What Pure Math Gets Right: Meaning

Where It Breaks: Generation

The 15 Versions That Followed

The Scorecard

Why It Can't Generate Sentences

What This Proves

The Stack

What I'd Build Next

Try It Yourself

learning begins.

I built 3 MCP servers so I can ask Claude about my DevOps stack

Azure ML Feature Store with Terraform: Managed Feature Materialization for Training and Inference 🗃️

🏗️ Feature Store Architecture

🔧 Terraform: Provision Feature Store Infrastructure

Feature Store Workspace

Offline Materialization Store

Online Store (Redis Cache)

Compute for Materialization

🐍 Define Entities and Feature Sets (SDK)

Create an Entity

Define Feature Set with Transformation Code

Register and Materialize

Configure Materialization Schedule

📐 Environment Configuration

⚠️ Gotchas and Tips

⏭️ What's Next

Why does PHP need asynchrony?

The fragmentation problem ​

TrueAsync

2026 Goldman Sachs Coding Interview Real Questions & Solutions

Problem 1: Transaction Segments

Problem 2: Efficient Tasks

Problem 3: Design HashMap

Solution Idea

Python Implementation

Complexity

Interview Tips

I built a database engine in pure C – here's what I learned

What it does

The hardest part

The architecture

What I'd do differently

Links

I Made a Free Tool That Roasts Your Website's Health in 20 Seconds

Broken Assets Are Everywhere

SSL Certificates Nobody Is Watching

Mixed Content Hiding in Plain Sight

Security Headers? What Security Headers?

SEO Metadata That's Quietly Broken

Missing Sitemaps

The Pattern

How It Works

Try It

What's Next

Why We Rebuilt Our Internal Tool from Scratch And What I Learned

Email Drafter — Multi-Agent Email Writing with Google ADK and Cloud Run

What I Built

Cloud Run Embed

Your Agents

Key Learnings

Demo

Remote jobs in Rust – from a file to NATS in three steps

io_uring Adventures: Rust Servers That Love Syscalls

io_uring Adventures: Rust Servers That Love Syscalls

Our Rust file server hit a ceiling at 45K requests/sec. Switching to io_uring multiplied throughput 3.4x and cut latency 68% — but the journey taught us syscalls aren’t the enemy, context switches are.

The Syscall Tax Nobody Talks About

The io_uring Mental Model Shift

The Rewrite: From Tokio to tokio-uring

The Numbers That Justified Everything

The Gotchas That Bit Us Hard

The Hybrid Strategy That Actually Works

Advanced Patterns: Buffer Registration and Linked Operations

The Decision Framework: When io_uring Makes Sense

The Rust Ecosystem: Maturity and Gaps

Performance Insights: Where the Gains Actually Come From

The Monitoring Story: Observability Matters More

The fragmentation problem

1. $O (1)$ Metadata Branching