Alternative data is going to help the African financially underserved have access to credit. At least, that was the plan. We all knew they didn’t have any credit history so getting access to their SMS, contact data, or even apps they have on their phones. That’s what we were told. But did that happen? No, possibly failed woefully.
Before I get into the part of this story that went wrong, I want to be clear about what I mean by alternative data, because the term covers a lot more ground than the one example everyone reaches for. When people talk about alternative data in African lending, they usually mean SMS and contact data scraping, since that’s the version that got the most press and the most backlash. But the category is much wider than that.
It includes mobile money transaction history, airtime recharge patterns, utility and rent payment records, e-commerce activity, psychometric assessments that try to measure a borrower’s character through a quiz, geolocation, and social media behavior. Some of these have aged well. Some never really worked. SMS and contact data scraping is the one that did the most damage, and it’s worth understanding in detail because it shaped how an entire generation of African borrowers learned to distrust digital lending.
I’ve written before about why telco data, when released properly, is a better deal for the poor, and about the risks AI models carry when they’re trained on incomplete behavioral data. This piece sits next to both of those, because the SMS scoring story is the wake-up call that explains why I care so much about getting the next generation of alternative data right. We had a working idea, but then we broke it ourselves, and the way we broke it tells you almost everything you need to know about how not to build credit infrastructure for people who’ve never had access to it.
The credit gap that started this whole experiment
Step back about ten years and you land on the root of the problem, which is that banks across Africa simply weren’t lending to ordinary people, even though most of those people were just as capable of repaying a loan as anyone holding a salary account at a tier-one bank. The entire infrastructure for assessing creditworthiness assumed you already had a financial history worth assessing, which excluded almost everyone who’d never had access to formal credit to begin with.
You needed a credit report which didn’t exist because bureau coverage across the continent was thin, in some markets covering less than a quarter of the adult population. Or perhaps banking data that couldn’t be pulled together because the banks themselves were fragmented and open banking hadn’t been built yet. Every door traditional underwriting expected you to walk through was locked, and most people didn’t even have the key.
So the industry improvised, the way industries always do when the obvious tools aren’t available. And the wider improvisation, the one that produced the whole category of alternative data, was reasonable on its face. If someone is sending and receiving mobile money regularly, topping up airtime on a consistent schedule, or paying their electricity bill on time month after month, those are genuine signs of financial discipline that a bureau report would never capture.
Several of these approaches still hold up reasonably well today. Standard Chartered’s digital lending arm in Africa, for instance, leans on mobile money transaction data specifically because bureau coverage in some of its markets sits below a fifth of the adult population.
SMS became the favorite, but that didn’t end quite well
Out of all the alternative data sources available, SMS scraping became the one the largest and most aggressive lenders built their entire model around, because it was the most information-dense option on the table. A person’s SMS inbox typically houses more than just transaction alerts, there you could find loan approvals from other lenders, repayment reminders, salary credit notifications, and bill payment confirmations, all sitting in one place and readable the moment an app was granted permission. Companies like Tala and Branch led this wave, alongside a long list of local players like Quickash and dozens of others who copied the same script with smaller budgets and louder marketing.
You’d download an app, and at launch it would request a long list of permissions covering your SMS, your installed apps, your contacts, and your location. Once you granted access, the app would scrape everything it could find and feed it into a scoring model that decided, often within minutes, whether you qualified for a loan and how much.
This is where I want to be precise about what failed and what didn’t, because lumping every form of alternative data into one failed experiment would be misleading. Mobile money and airtime data, used on their own and with proper consent, have held up reasonably well across multiple markets.
Utility payment data has done the same in places like Latin America and increasingly in Ghana, where fintechs now look at mobile money patterns alongside business registration data for market traders. SMS scraping is the branch of this tree that rotted, and it rotted specifically because of how much intimate, uncontrollable information it gave lenders access to, and how little say borrowers had in any of it.
The moment borrowers caught on, it was game over
The first crack showed up the moment customers figured out what these apps were in fact reading. Once people understood that loan officers somewhere downstream could see sent by other lenders, the obvious response followed almost immediately, with borrowers deleting loan approval texts, repayment reminders, and anything else that hinted at an existing obligation to another lender, all gone the second it landed on the device.
What turned this into a technical failure rather than just an ethical one is simple: a message deleted off the phone is gone for any app trying to read current SMS content. Some lenders tried to get ahead of this by reading messages as they arrived in real time, catching new SMS as it landed but doing nothing for anything deleted before the app was even installed.
A borrower could walk into a second loan carrying three outstanding obligations elsewhere, with an empty SMS thread and a clean-looking risk profile, all without doing anything more sophisticated than tapping delete a few times before opening the next app. The scoring model wasn’t being outsmarted by some elaborate fraud operation, ordinary people simply wanted to protect themselves.
And so, a lot of lenders treated the SMS deletion problem as a reason to reach deeper into the phone instead of an indication the model needed rethinking. If SMS alone wasn’t giving a complete picture anymore, the response was to pull contact lists too, along with call logs and the full inventory of apps installed on the device.
Research into digital lending apps operating in markets like Kenya found exactly this pattern playing out at scale, with apps requesting access to SMS content, contact lists, call logs, and installed app data well beyond what any reasonable underwriting process required to make a lending decision.
Contacts became a weapon in their own right. Lenders started using contact lists to identify guarantors without ever asking the borrower to nominate one, and when borrowers defaulted, agents would call straight through to family members, employers, and friends, sometimes to demand repayment on someone else’s behalf and sometimes just to embarrass the borrower into paying faster.
Very little of this was something a borrower had meaningfully consented to, even where a permissions dialog existed somewhere in the onboarding flow. What started as a reasonable workaround for missing credit data turned into something that looked a lot more like surveillance with a loan attached to it, and it gave the entire category of alternative data a reputation it’s still working to shake off.
Borrowers learned to play defense, App stores drew a line
People adapted the way people always do when they realize they’re being watched. Borrowers learned to leave contact lists sparse or fake, knowing a full address book just meant more people for a lender to harass later if repayment ever slipped. They started running separate SIM cards for separate lenders, since a fresh device profile with no shared history was harder to cross-reference against other apps tracking the same person across different platforms.
The moment a loan was repaid, the app would come off the phone entirely, partly to reclaim storage and partly because nobody wanted yesterday’s lender lurking in the background reading tomorrow’s messages.
The cycle kept compounding from there. Every defensive move borrowers made forced lenders to reach for more data to compensate, which pushed borrowers to defend themselves more aggressively, which degraded the data lenders were collecting even further than before.
Eventually this reached a point where the access itself became politically and technically untenable. Google reclassified SMS and call log permissions as dangerous, restricting which categories of apps could even request them, which effectively shut the door on most lending apps reading SMS content the way they had for years.
Apple‘s ecosystem never opened that door in the first place, which meant the entire SMS scoring model was always, structurally, an Android-only phenomenon dependent on a permission system that was ultimately going to get locked down once enough abuse cases piled up against it.
This lockdown was a direct response to the kind of abusive data harvesting I just described, the contact scraping and the location tracking and the installed-app inventories that had nothing to do with assessing whether someone could repay a loan. The platforms shut this down because the industry built around this access had stopped behaving responsibly with the trust it had been given, and SMS scoring specifically paid the price for years of poor judgment by the companies using it.
The quality collapse and the desperate phase
What should have worried lenders more than it did at the time was this: as borrowers got better at managing what these apps could see, the predictive quality of the scoring models started declining, slowly at first and then sharply, leaving lenders running sophisticated algorithms on increasingly compromised data, which is about the worst combination you can build a lending business on.
This connects to something I wrote about more broadly when looking at AI underwriting risk across developing markets, where I made the point that a model is never neutral and always reflects every decision that went into building it, including which data sources to trust and how much weight to put on them. When the underlying data becomes unreliable because the population being scored has every incentive to manipulate it, no amount of clever feature engineering fixes that problem.
By the time the quality decline became impossible to ignore, plenty of lenders had already built their entire risk infrastructure around this approach, and ripping it out wasn’t a simple decision to make from a boardroom that had spent years and investor capital defending the model. So instead of stepping back, a lot of players doubled down, pulling even more aggressively on contacts and location and app inventories, hoping that more inputs would somehow compensate for the fact that borrowers had learned to game the core engine feeding the whole system.
This is usually how these stories go, with the honest fix requiring an admission that the original model has limits, and that admission being harder to make than simply adding one more data point and hoping it helps. The result was an industry that, for a stretch of years, looked advanced from the outside while getting worse at the one thing it needed to do, which was separate good borrowers from bad ones with any real consistency.
Default rates didn’t improve the way the marketing suggested they should. Borrower trust eroded gradually. Regulators across multiple markets started paying closer attention to what these apps were doing with the data they collected, and the reputational damage from aggressive contact harassment alone did lasting harm to how digital lending was perceived across the continent, in ways that still color how people talk about loan apps today.
What the rest of alternative data still gets right
It’s worth pausing here to give credit where it’s due, because the SMS story can make it sound like every form of alternative data is tainted, and that isn’t the case. Mobile money and airtime recharge data have continued to perform well as predictive signals, partly because they reflect spending and saving discipline rather than private conversations, and partly because they’re harder for a borrower to manipulate without genuinely changing their financial behavior.
Utility payment data works on similar logic. Psychometric assessments remain more contested, since they ask a borrower to answer questions designed to infer character traits, and the jury is still out on how well that translates across different cultural and economic contexts. The common thread across the data sources that have aged well is that none of them require reading someone’s private messages or calling their relatives to collect on a debt.
None of this means the broader instinct behind alternative data was wrong. Africa needed a way to assess creditworthiness for people who had never been inside a formal credit system, and phone-based behavioral data carries real predictive value when it’s collected properly and with consent.
I’ve made the case before that telco data like call patterns, airtime recharge behavior, mobile money activity, and data usage consistency, holds genuine signal because it reflects an economic rhythm that a decent model can read without needing to touch anyone’s private messages. The category was sound from the beginning. What collapsed, specifically within the SMS branch of it, was how the data got collected and who controlled it once it had been collected.
The fix has to start with consent that means something beyond a permissions dialog buried in an onboarding flow nobody reads before tapping accept. It has to involve data the customer can see, understand, and meaningfully control, rather than a black box scraping whatever it can reach the moment an app gets installed on a phone. And it has to come through a channel the borrower doesn’t have unilateral power to sabotage the way they could delete an SMS thread in three seconds flat.
Telco-held data fits this far better than device-scraped data ever could, since it sits with the network operator rather than on a phone where any borrower with five minutes and a motive can edit the record clean. I laid out a version of how this kind of consent-based telco model could work in practice, and the short version is that it requires the borrower to opt in explicitly, get notified every time their data gets accessed, and retain the ability to revoke that access, none of which the SMS-scraping era ever bothered to build into the system.
This brings to mind something I think about all the time, which is that credit access is foundational to prosperity across the continent, and you don’t get there by building underwriting systems that borrowers are incentivized to defeat from the first day they install the app. You get there by building systems borrowers can trust enough to engage with honestly, which was the entire premise behind pushing for open APIs as the foundation of inclusive credit scoring long before any of us watched the SMS model collapse under its own weight.
What this should have taught the industry
If there’s one thing worth pulling out of this whole experiment, it’s that data quality and data ethics were never separate problems, even though parts of the industry spent years treating them as though they were. Every time lenders pushed further into invasive collection without proper consent, they created a reputational liability and simultaneously degraded the thing they were trying to build, because borrowers will always respond to surveillance with evasion, and evasion is corrosive to exactly the kind of clean, consistent behavioral signal that good underwriting depends on to function.
The lesson isn’t that alternative data fails in Africa as a category because some of it hasn’t, and the parts that have stayed disciplined about consent and scope are still doing useful work in markets across the continent today.
If there’s one thing worth pulling out of this whole experiment, it’s that data quality and data ethics were never separate problems, even though parts of the industry spent years treating them as though they were. SMS scraping failed because it reached too far into a borrower’s private life without giving them any real say in the matter. Contact scraping failed for a different but equally serious reason, exposing third-party information to loan transactions those people were never part of. Both paid for that overreach with the one thing a scoring model can’t survive without, which is data people haven’t been given every reason to falsify. We had ten years to learn those distinctions. I’d rather we didn’t need another ten to put them into practice across the rest of the category.
Discover more from Adedeji Olowe
Subscribe to get the latest posts sent to your email.