Gojiberries (Page 2)

Sign in Subscribe

More issues

Stable LLM Inference

A mathematical function gives the same output for the same input. LLMs do not. Not even at temperature = 0. Determinism matters for reproducibility, debugging, and for keeping pipelines stable. Without it, identical prompts can yield divergent completions, making experiments irreproducible and systems brittle. We can intervene at three broad layers

Identifying Assumptions

One of the reasons why I specifically look at Muslims is, as compared to other marginalized groups in India—let’s say the Scheduled Castes or the Scheduled Tribes—is primarily because there have been political reservation norms for the Scheduled Castes and Scheduled Tribes. There is a plethora of

What Problem Is Traditional RAG Solving?

RAG solves a specific technical problem: it enables models trained on broad public text to answer questions about specialized content without retraining. But solving a technical problem doesn't automatically make RAG the right business solution. A useful way to reason about search-system design is to model its net

Conscious Decoupling: Separating Budgets For Thought and Action

Current LLM services bundle model intelligence with supporting infrastructure into fixed tiers. When you purchase access to GPT‑5 Thinking, Claude Opus 4.1, or similar offerings, you receive a package: some amount of model capability paired with some unspecified quantity of supporting compute. This bundling serves a purpose. It

Testing LLMs: Engineering Confidence Without Certainty

Software testing has always relied on a fragile assumption: that we can enumerate test cases that represent production behavior. We test specific inputs, verify outputs, and trust that production will behave similarly. This works until it doesn't. Search ranking degrades as user behavior evolves. Heuristics fail when traffic

What Problem is DSPy Solving

Follow up to: Not so Prompt: Prompt Optimization as Model Selection The practical problem is to choose a prompt (or a set of prompts) that perform the best on a task in the "real world" while respecting constraints, e.g., the output must parse, latency should stay below

The Case For Pay-as-Bid GPU Pricing

Most GPU providers post standard rental rates by hour and adjust them infrequently. Some expose dynamic prices that evolve based on supply and demand. But these approaches, while simple and predictable, may not be optimal given current market conditions: severe supply constraints for high-end models, large differences in customer valuations,

The Double Descent of Standard Errors

Since the last essay, I have been percolating over whether double descent also means smaller s.e. for HTE. Estimating heterogeneous treatment effects involves two goals: accurate out-of-sample prediction of $\hat{\tau}(x)$ and valid inference for functionals like average treatment effects and policy values. Standard practice constrains model complexity

Testing Distributional Implications of LATE

Many randomized encouragement designs have imperfect compliance, where only a fraction of people comply with their assignment. Examples include phone-bank get-out-the-vote (GOTV) campaigns and draft lotteries like the Vietnam Draft Lottery. In these settings, it is common to use instrumental variable (IV) regression for analysis. Instrumental variables identify the Local