Guest post by Sid Ravinutala
Change one thing at a time, especially for the first few tech projects. Complexity is multiplicative in the number of things you are trying to change. Don’t rush to use machine learning immediately, get the plumbing right first. Spend more time (yes, even more than that) interacting with and getting feedback from the people who will use your solution. Build something quickly to get feedback from your users. You will get it wrong the first few times, plan for it.
Over the last few years, IDinsight has been building a data science team to complement the organisation’s strengths of deep methodological rigour, sectoral expertise, and razor-sharp focus on informing decisions. As early movers in this field we made a tonne of mistakes. Here are four mildly controversial takes on data science projects in the development sector from our experience on projects.
Build around what’s there
For the average org in the development sector, deploying a new technology solution is already a large undertaking. If you also plan to change your processes, technology stack, or staffing, you’re going to increase the complexity of the project by an order of magnitude. All of these elements are tightly coupled and create a lot of moving parts that need to be coordinated.
For the first few projects, keep your scope narrow; look to understand how you can take advantage of your current technology and process ecosystem. Are there screens that already have eyeballs on them? Augment them with better analytics. Are there people already making decisions as part of their job? Perhaps an algorithm could assist their decision-making. Is there a set of technologies that your organisation is already comfortable with? Have a bias toward building your solution using just that.
The flipside of this is also worth noting. If you’re going to build a new app, you’re going to have to pry people’s attention away from other tasks. Vying for attention is tricky business and adds risk to the project. If there is no one you can identify by name who is hankering for data or analytics to be better at their job, then your solution will probably just languish on a server contributing to climate change. If you are going to introduce a brand new technology to the organisation, you’ll need to budget more for training and maintenance.
We spent the last couple of weeks learning Grafana, UptimeRobot, and Prometheus for a project. Not because these are the best tools for a monitoring system but because these are what the client already uses. They are already looking at Grafana dashboards – one additional page has a nominal attention cost. They already have a Prometheus server that they are maintaining, one additional data science source is easy to add and maintain. There may be better technical solutions, but none that will have a lower cognitive and maintenance burden for the client. Given that the client has many years of experience with their tools, they can engage deeply with the solution design.
Don’t start with ML
This is also Rule 1 of machine learning at Google. But I’m making an even stronger case for it in the development sector.
First, some realism to combat the hype. Everybody wants the snazzy deep learning model. Very few need it. Even fewer have the data for it. And a fraction can monitor and maintain it. A simple algorithm, or a heuristic will probably get you a lot of the way there.
Second, the data pipelines, the user interface, and integration is often a lot trickier than you think; you want to get this hard stuff out of the way first. Improving or swapping out the core algorithm may be a complicated problem but with talented data scientists it can be done. Deploying an application is a complex problem that touches many parts of the business and is often trickier to get right.
Third, you may be surprised at the margin where your users would like to see improvements. In the early stages, more often than not, it will be how it is presented and how they can internalise and use it. Marginal improvements in accuracy of the model might come later.
I like to frame what we do as “data-driven applications” not “machine learning applications”. We recently built a solution to better target a social program that at its core is multiplying two matrices. That’s it. The rest of the time was spent setting up APIs and data pipelines, and deploying the solution. There are a lot of fancy things we can do to improve the algorithm, but those are second order for now. In phase two of the project, the client wants to focus on how to make it seamless for their employees to get the most out of the solution. And we agree!
Talk to stakeholders
Most probably not you, dear reader. These are the people who will actually interact with the solution regularly. These might be front line workers, helpdesk staff, maybe even citizens if it’s public facing.
If you are building something that you expect a group of people to use, speak to said group of people. Not controversial, right? Surprisingly, “We will spend next few months talking to front line workers” is a hard thing to sell to funders and clients when they are signing up for a data science project. Also, organisations are protective of access to these end users till we are in late stages of the project. But if you don’t have these conversations up-front, you are more likely to end up building something that generates more profanity than joy.
One of our projects was building an Early Warning System for COVID-19. It combined google trends and data from a symptom tracker app to predict covid outbreaks at a sub-national level. Very spiffy. Though there was general consensus from our partner that this was useful, we hadn’t identified the person who wanted this to improve their disease surveillance work. When we did go looking for the user, we discovered that they were busy fighting fires and this was not the need of the hour. All those fancy algorithms and pretty visualisations amounted to not much more than a few presentations and a very nice github repo.
There is a science to capturing “user stories”. In recent months, we have invested heavily in identifying end users and truly understanding their processes and pain-points. Every product we build must have a razor-sharp focus on the user and not a top-down hammer looking for a nail.
“I know it when I see it”
Speaking to end users doesn’t immediately lead to awesome design. Often it is very difficult to articulate up front what you want exactly. This is especially true if the end user is fairly inexperienced with technology. It is a lot more productive to build something quickly and ask “How does this look?”. The first few answers will most definitely be “Not great”. But this will hopefully be followed by a discussion that is a lot more specific and productive.
If you are not used to this iterative way of working, seeing an unpolished and unfinished product early may be a bit jarring. Setting expectations up front helps; this is far from the final product and the goal is to seek feedback and set priorities.
We recently presented an “urgency detection” service that scores incoming messages on Praekelt’s momconnect platform. The Praekelt team understood that these “demos” are far from final and are a way to provide input into the development process. We will be launching this service in a month with very few bells and whistles. But we have a rich roadmap full of features and more releases planned. The scope of each will be adapted based on how the service performs in the hands of real users.
There is a lot more to be said here and a lot has already been said on this sort of incremental rollout. This is pretty standard for tech projects in the private sector but less so in the development sector where the big bang approach is still the norm. So dear funders and social sector leaders, on your next RFP or project proposal, demand that there be a roadmap where you go “live” quickly and iterate, adapt, and improve based on user feedback. To our fellow partners working on “data science solutions”, I hope our hard-earned lessons help make your next project a greater success.
IDinsight is a mission-driven global advisory, data analytics, and research organization that helps global development leaders maximize their social impact.