ARTICLE AD BOX

Summary
Large language AI models do not behave with consistency, which is harmless in some business contexts but dangerous in others. Adapting AI for determinism—the classic promise of software—is a major challenge waiting to be tackled. Will Indian IT service players rise to the occasion?
Enterprise technology has long rested on a basic assumption: determinism. When a system gets identical inputs, it must yield identical outputs. Business and tech leaders rely on this expectation. Banks can reconcile millions of financial movements and telecom operators can bill subscribers accurately because the software they use behaves in a perfectly predictable manner. This is true across enterprises.
Determinism is not a trivial engineering attribute; it gives regulators assurance, auditors clarity and businesses stability. It is an unspoken contract between organizations and their digital systems. Historically, that contract has been upheld. Large language models (LLMs), however, have begun to stretch this long-standing assumption.
The LLMs of AI did not descend from the deterministic lineage of classic enterprise systems. They emerged from probability theory, pattern recognition and statistical learning. An LLM knows that in English, ‘good’ is more likely to follow ‘very’ than, say, ‘hippopotamus.’ But ask the same LLM the same question twice and its responses may vary even if it didn’t hallucinate. It may generate ‘cold’ after ‘very’ rather than ‘good.’
Use the same model on two identical machines and minor differences can appear. Even if the system’s ‘temperature’ is set to zero (which instructs it to choose the single most likely next word), minor variations may appear since an LLM generates text one token (a few letters, not always a full word) at a time. A tiny difference in the underlying probability distribution can lead it to pick a different token, which influences the next token, and so on. But this violates the deterministic principle: that a machine must always act in the same way.
Modern LLMs depend on massive quantities of floating-point computations executed in parallel across thousands of processing units. Billions of small mathematical operations are done just to generate ‘good’ or ‘cold.’ The order in which these occur can shift subtly, based on how the hardware schedules tasks or manages memory at that moment. Floating-point arithmetic is sensitive to ordering. In isolation, each discrepancy is microscopic. But a barely perceptible variation can cascade into a sentence meaningfully different from an earlier one.
Software layers add to this drift. Inference engines combine operations, reorder computational steps and adjust memory access. These optimizations run through processor ‘kernels’ (small programs), which are engineered for throughput, not for guaranteeing reproducibility. When inference engines alter or reorganize kernels to better exploit the hardware, they also change the sequence of computations, leading to varying floating-point math and output variation. Optimization paths can diverge.
The challenge heightens when LLMs run across multiple processing units. Distributed inference requires these units to exchange partial calculations and the timing and sequencing of these interactions can have tiny discrepancies. For enterprises accustomed to deterministic software, such behaviour is erratic. But it is a natural outcome of how AI works.
Is it possible to control or limit this unpredictability? Yes, within boundaries. Determinism on a single machine is achievable. It requires fixing random seeds, eliminating sampling-driven randomness, freezing the entire software stack and using deterministic computation paths wherever available. The trade-off is reduced performance. Deterministic kernels and rigid execution paths rarely match the speed of highly optimized, non-deterministic alternatives. But in regulated industries, slower but reproducible behaviour is preferable to faster but inconsistent outcomes.
Achieving determinism across multiple machines is more difficult. Every element of the technology in use must match precisely. With strict engineering discipline, this can be achieved, though it could be significantly costlier.
Determinism across heterogeneous hardware is unattainable today, as each hardware family does its floating-point arithmetic in its own way. Their kernels, compilers and memory architectures differ. No current software abstraction can harmonize these discrepancies. Over time, the industry may adopt more reproducible standards, but at present, expecting flawless cross-hardware determinism from LLMs is unrealistic.
For many consumer-facing use cases, non-determinism is harmless. Chatbots, creative assistants and ideation tools lose nothing if their output varies; indeed, variation often enhances them. But once LLMs are integrated into enterprise decision flows, inconsistency is a liability. For example, a compliance model must consistently produce the same justification for the same case, just as an insurer cannot provide different responses to customers with matching profiles. Engineering teams also rely heavily on determinism.
Debugging requires reproducible failures. Software regression testing depends on stable baselines. Safety assessments need consistent behaviour.
Enterprises do not need to abandon AI to maintain their reliability standards. They simply need to treat AI as a governed system rather than an enigmatic oracle. Standardized hardware, frozen environments and reproducible inference pipelines will help their cause. Controlling non-determinism is not optional; it is an important step towards making AI truly enterprise-ready. Perhaps this challenge could be taken on by Indian IT service companies.
The author is co-founder of Siana Capital, a venture fund manager.

1 month ago
3




English (US) ·