India’s data trove: Don’t grant AI majors free access to this strategic asset—use it as leverage instead

1 month ago 10

ARTICLE AD BOX

logo

India generates plenty of the kind of data that improves AI systems. (istockphoto)

Summary

India’s vast and diverse data is fast becoming one of the most coveted assets in the global AI race. Yet, without the right policies, it risks flowing out to strengthen foreign models while leaving limited gains at home. How New Delhi leverages this resource could shape India’s technological future.

India is fast becoming one of the world’s biggest AI user bases. The question now is how it can turn that scale into superpower status rather than just training Silicon Valley for free.

That will be a tall order for a country largely caught flat-footed by the boom. But let’s start with the basics: The three main building blocks of AI are talent, compute (including high-end chips and infrastructure) and data. India doesn’t lack engineers, but it currently doesn’t have foundational research training at scale or enough advanced processors at public labs and universities. What it has, in abundance, is data. It should treat this like a strategic asset instead of leaking it out as a free export.

It’s a key reason US Big Tech is making a blitz for the market. With roughly a billion people online and a massive mobile-first population, India generates a torrent of the kind of human feedback that makes AI systems better on a daily basis.

The world’s most-populous country is the second-biggest user base of both OpenAI’s ChatGPT and Anthropic’s Claude after the US, while accounting for just a fraction of these platforms’ revenue. The dynamic exposes how much more the market matters for training purposes right now than making money.

These free-to-use services and promotions aimed at Indian phone users come with a cost. It’s part of a strategic Silicon Valley grab for Indian languages, voices and behaviours that will make foreign systems smarter first. The South Asian nation risks repeating a familiar historical pattern of exporting the raw materials for pennies then buying back imported models at a premium. Meanwhile, it will be left to absorb a job shock and social impacts at home.

India’s linguistic diversity also raises the stakes. If models aren’t trained on enough local speech and cultural contexts, they’ll misunderstand users and become unreliable in classrooms, clinics, courts and even customer support settings. Closing this language gap sits at the heart of [the Indian government’s] promise to democratize AI and make its impacts real for everyone from farmers to small-business owners, rather than just English-speaking elites.

At the same time, the AI future that the likes of Meta or OpenAI are selling—marked by personal agents and voice-powered ambient devices—won’t work in India unless they can listen and speak local languages and get the nuances right.

Some startups, including Andreessen Horowitz-backed Poseidon AI and Big Tech-supported non-profit efforts, are already trying to crowdsource and create local language data-sets.

New Delhi should be paying far more attention, not just because data-labelling and collection practices have a global reputation of being exploitative, but also because these efforts could anchor a domestic ecosystem. India can’t demand “AI for all” while outsourcing the work of building a linguistic foundation. Done well, though, these data-sets can become infrastructure for its AI economy.

The same logic applies beyond language. India should push hard for the creation of specialized, high-impact and localized data-sets in sectors like health care or finance. AI can improve diagnostics and personalized care, but the most valuable data for accomplishing this still lies in largely inaccessible hospital systems.

Privacy fears are real and should be taken seriously, but accessing this data could also mean saving lives. Unlocking and organizing it is the hard, unglamorous work that takes Modi’s branding of “AI for good” beyond just slogans.

Ultimately, India’s data reckoning should be about who controls this strategic input to AI and who captures the value from it. The answer isn’t to wall off user outputs from the world. It’s about finding creative solutions and leveraging them to set rules that reflect what’s actually being extracted. If its peoples’ data is a key ingredient for building advanced AI, the government should demand more than apps and marketing in return.

It can ask for partnerships that build capacity, including public compute commitments, access to high-end chips, serious training pipelines for AI researchers and collaborations that go beyond token commitments. New Delhi should also set norms that treat local data-sets as a public good and consider revenue-sharing models that maintain the upsides at home.

Transparency is crucially important. Policymakers should require foreign model builders to disclose the kind of data that shaped their systems and how it has been evaluated for harms and biases in Indian contexts.

More than building foundation models, setting equitable data policies is where India has the biggest opportunity to truly lead the Global South in the AI era. Otherwise, it risks becoming an open mine and fuelling systems that automate local jobs, concentrate power abroad and deepen dependencies. ©Bloomberg

The author is a Bloomberg Opinion columnist covering Asia tech.

Read Entire Article