The Disease Detective

This post was originally published on this site

Joe DeRisi remembers very clearly when his obsession with mystery diseases began. As a teenager in the 1980s, growing up outside Sacramento, he was riveted by news reports about the AIDS epidemic, which in its early years was spreading around the world and killing thousands of people while scientists struggled to establish the cause. “I mean, what is it?” DeRisi says. “Is it a virus? Nobody knew! That whole concept, that we could have an epidemic or pandemic but couldn’t figure out what was behind it — that stuck with me my whole life.”

Today DeRisi is a professor of biochemistry who studies infectious diseases at the University of California, San Francisco, and co-president of the Chan Zuckerberg Biohub, a research institute in the city’s Mission Bay neighborhood. Lean and white-haired at 51, he tends to talk in rapid bursts, sometimes inflected with a California-stoner vibe. When I met him at the Biohub in May, he, like many geneticists, had just come off a harried year of working on Covid-19, during which he transformed his lab into a facility that could process more than 2,600 rapid tests a day. “Things have definitely calmed down,” DeRisi said, as he led me inside the Biohub. “It was pretty intense for a while there.”

As we made a lightning tour of the lab, DeRisi waved his hand at a series of expensive genetic sequencers, including one the size of a refrigerator. “Boring gray boxes,” he announced, before moving on. On a table near the back was a much smaller unit, plain white and roughly the size of a milk crate, with a simple touch-screen. Not long after becoming president of the Biohub in 2016, DeRisi started a project designed to spot unfamiliar diseases well before they would normally be detected. The white box, when connected to an elaborate analysis system DeRisi had designed, allowed researchers from around the world to piece together all the different DNA or RNA recovered from just about any sample — throat swabs, blood draws or other material — and scan it for unidentified pathogens.

The medical word for such diseases is “idiopathic”: conditions whose symptoms can be described but that have no known cause. Before germs were understood, most illnesses were idiopathic by definition, including the Black Death, which we now know was caused by a bacterium (Yersinia pestis) but which doctors at the time hypothesized might be caused by staring at someone who was ill, the alignment of the planets, bad smells or wearing pointed shoes. What’s startling is how many mystery infections still exist today. More than a third of acute respiratory illnesses are idiopathic; the same is true for up to 40 percent of gastrointestinal disorders and more than half the cases of encephalitis (swelling of the brain). Up to 20 percent of cancers and a substantial portion of autoimmune diseases, including multiple sclerosis and rheumatoid arthritis, are thought to have viral triggers, but a vast majority of those have yet to be identified.

Globally, the numbers can be even worse, and the stakes often higher. “Say a person comes into the hospital in Sierra Leone with a fever and flulike symptoms,” DeRisi says. “After a few days, or a week, they die. What caused that illness? Most of the time, we never find out. Because if the cause isn’t something that we can culture and test for” — like hepatitis, or strep throat — “it basically just stays a mystery.”

While the cause of Covid-19 was quickly identified as a coronavirus, DeRisi notes, that won’t necessarily be the case with whatever germ creates the next pandemic. And past strategies for detecting potentially dangerous viruses haven’t always been very systematic. “Different prevention projects in the past have just sort of picked up random roadkill on the side of the road and looked for viruses in it,” DeRisi told me. “Or they’ll look for all the viruses in bats.” While there’s a place for that sort of sampling, DeRisi said, it’s hard to know which of the many organisms discovered actually poses a risk. “Like, we have a project that’s examining the slurry in swine farms,” he went on. “And we’ve identified at least 200 novel viruses so far. Which is great! But we have no idea which of those, if any, have the ability to jump into humans — or how bad it would be if they did.”

It would be better, DeRisi says, to watch for rare cases of mystery illnesses in people, which often exist well before a pathogen gains traction and is able to spread. Based on a retrospective analysis of blood samples, scientists now know that H.I.V. emerged nearly a dozen times over a century, starting in the 1920s, before it went global. Zika was a relatively harmless illness before a single mutation, in 2013, gave the virus the ability to enter and damage brain cells. Cristina Tato, an immunologist who runs the Biohub’s Rapid Response Team, points out that months before Zika exploded in Brazil, causing developmental issues and microcephaly in infants, researchers in the South Pacific noticed an increase in neurological symptoms, a missed clue that Zika was changing.

“With pathogens, we’re much better at watching for things that we already know are out there,” DeRisi said. “Ebola, we know. Zika, we know. The beauty of this approach” — running blood samples from people hospitalized all over the world through his system, known as IDseq — “is that it works even for things that we’ve never seen before, or things that we might think we’ve seen but which are actually something new.”

Biological samples being prepared for sequencing.
Carlos Chavarría for The New York Times

Traditionally, the way that scientists have identified organisms in a sample is to culture them: Isolate a particular bacterium (or virus or parasite or fungus); grow it in a petri dish; and then examine the result under a microscope, or use genomic sequencing, to understand just what it is. But because less than 2 percent of bacteria — and even fewer viruses — can be grown in a lab, the process often reveals only a tiny fraction of what’s actually there. It’s a bit like planting 100 different kinds of seeds that you found in an old jar. One or two of those will germinate and produce a plant, but there’s no way to know what the rest might have grown into.

And because different types of bacteria require specific conditions in order to grow, you also need some idea of what you’re looking for in order to find it. The same is true of genomic sequencing, which relies on “primers” designed to match different combinations of nucleotides (the building blocks of DNA and RNA). Even looking at a slide under a microscope requires staining, which makes organisms easier to see — but the stains used to identify bacteria and parasites, for instance, aren’t the same.

The practice that DeRisi helped pioneer to skirt this problem is known as metagenomic sequencing. Unlike ordinary genomic sequencing, which tries to spell out the purified DNA of a single, known organism, metagenomic sequencing can be applied to a messy sample of just about anything — blood, mud, seawater, snot — which will often contain dozens or hundreds of different organisms, all unknown, and each with its own DNA. In order to read all the fragmented genetic material, metagenomic sequencing uses sophisticated software to stitch the pieces together by matching overlapping segments.

The assembled genomes are then compared against a vast database of all known genomic sequences — maintained by the government-run National Center for Biotechnology Information — making it possible for researchers to identify everything in the mix. In this scenario, an undiscovered or completely new virus won’t trigger a match but will instead be flagged. (Even in those cases, the mystery pathogen will usually belong to a known virus family: coronaviruses, for instance, or filoviruses that cause hemorrhagic fevers like Ebola and Marburg.)

Metagenomic sequencing is especially good at what scientists call “environmental sampling”: identifying, say, every type of bacteria present in the gut microbiome, or in a teaspoon of seawater. Such studies have revealed just how vast the microbial world is, and how little we know about it. One study found more than 1,000 different kinds of viruses in a tiny amount of human stool; another found a million in a couple of pounds of marine sediment. And most were organisms that nobody had seen before.

In 2002, as an assistant professor, DeRisi and his collaborator David Wang created the first medical version of this tool, a DNA microarray called the ViroChip that was designed to identify any known virus from a patient’s blood or tissue, and also detect any new or unknown virus. In the years after developing the ViroChip, DeRisi used it mostly to hunt for unknown pathogens connected to respiratory diseases, including asthma. One of his early successes was helping to identify a mystery disease from Hong Kong that would turn out to be SARS. He also solved medical mysteries; in one case, he figured out that a construction worker’s encephalitis was caused not by tuberculosis, as doctors thought for more than a year, but by a tapeworm from infected pork that had migrated to the patient’s brain.

He dabbled in animal epidemics as well. Along with diagnosing a fatal neurological disease in snakes, he investigated an infection that was killing cockatiels and parrots, and solved a bizarre rash of deaths among sharks and bat rays in San Francisco Bay. At one point, he even investigated a case of encephalitis in a polar bear, although the cause turned out to be an autoimmune disorder. (DeRisi now studies the same illness in humans.)

After the Biohub opened in 2016, one of DeRisi’s goals was to turn metagenomics from a rarefied technology used by a handful of elite universities into something that researchers around the world could benefit from. Unlike regular genomic sequencing, which is now cheap, metagenomics requires enormous amounts of computing power, putting it out of reach of all but the most well-funded research labs. The tool DeRisi created, IDseq, made it possible for researchers anywhere in the world to process samples through the use of a small, off-the-shelf sequencer, much like the one DeRisi had shown me in his lab, and then upload the results to the cloud for analysis.

DeRisi isn’t alone in this cloud-based approach to metagenomics — a growing number of start-ups are doing the same. But he’s the first to make the process so accessible, even in countries where lab supplies and training are scarce. DeRisi and his team tested the chemicals used to prepare DNA for sequencing and determined that using as little as half the recommended amount often worked fine. They also 3-D print some of the labs’ tools and replacement parts, and offer ongoing training and tech support. The metagenomic analysis itself — normally the most expensive part of the process — is provided free.

But DeRisi’s main innovation has been in streamlining and simplifying the extraordinarily complex computational side of metagenomics. “Most metagenomics programs are really hard to use,” a former collaborator noted. “They take a lot of practice and training.” IDseq is also fast, capable of doing analyses in hours that would take other systems weeks.

“What IDseq really did was to marry wet-lab work — accumulating samples, processing them, running them through a sequencer — with the bioinformatic analysis,” says Jennifer Bohl, a researcher who worked at the Laboratory of Malaria and Vector Research in Phnom Penh. “Without that, what happens in a lot of places is that the researcher will be like, ‘OK, I collected the samples!’ But because they can’t analyze them, the samples end up in the freezer. The information just gets stuck there.”

Carlos Chavarría for The New York Times

It wasn’t long after DeRisi completed the prototype for IDseq that he performed his first test of it as a global health tool — a trial run that delivered some fascinating results. It all began in fall 2017, when he ran into Farhad Imam, a pediatrician and senior program officer at the Gates Foundation, at a global health conference in Washington. As they discussed the challenges of deploying the system in the developing world, Imam hit on the idea of enlisting Senjuti Saha, a microbiologist at the Child Health Research Foundation in Dhaka, Bangladesh, to see if IDseq might help shed some light on a mystery there.

Earlier that year, the C.H.R.F. noticed a sharp uptick in cases of meningitis in children. Some of these were fatal; many left patients disabled. “In Bangladesh, when a child is disabled, the entire family completely falls apart,” Saha told me. “The mother doesn’t go to work anymore. The siblings fall out of school. They get into this vicious cycle of debt.”

Meningitis itself isn’t a disease, just a description meaning that the tissues around the brain and spinal cord have become inflamed. In the United States, bacterial infections can cause meningitis, as can enteroviruses, mumps and herpes simplex. But a high proportion of cases have, as doctors say, no known etiology: No one knows why the patient’s brain and spinal tissues are swelling.

This was the case with the Dhaka outbreak. C.H.R.F. is one of the premier microbiology labs in Southeast Asia and is in charge of tracking meningitis in the country for the World Health Organization. “Every meningitis case that comes in, we culture,” Saha told me. “We do antigen tests for pneumococcus, Neisseria meningitidis, Hemophilus influenzae and G.B.S.,” or Group B streptococcus — the four infections most likely to cause meningitis. “Then we do a much more sensitive and specific test for Streptococcus pneumoniae bacteria, since that causes the highest proportion of cases. And then we also do real-time P.C.R. looking for fragments of DNA from any of these pathogens.”

When the outbreak began, it was assumed that the cause would again be bacterial, but none of the tests could pinpoint a pathogen. Over the next year, Saha worked to solve the mystery, at times in collaboration with other labs. One partnership, with an organization in China, fell apart when the group wasn’t willing to share its techniques. Another set of researchers, in Canada, ran their own tests on the meningitis samples, but couldn’t figure out the cause either. Not long after, Saha attended a conference at the British Museum, where she gave a presentation titled “The Dark Side of Meningitis.” “It was a negative talk,” Saha recalls. “Like: Why does everybody talk only about the successful cases? We need to talk about the thousands of cases every year where we have no idea what’s causing the disease.”

Before meeting DeRisi, Saha was skeptical about yet another collaboration. But the two instantly hit it off. Though DeRisi could be impatient, Saha liked that he was direct, and appreciated that his “ethics are very strong. In his head, he’s like: This is right; this is wrong; this is what I’m going to do.” Still, she proceeded carefully. “Because IDseq was new, and because I am very meticulous, I included a lot of controls,” she told me. Of the 97 samples of cerebrospinal fluid, only 25 were from actual mystery-meningitis cases. The rest were either from cases for which Saha’s lab had already identified the cause, or weren’t meningitis at all. Several were simply water. “The idea was that all of these would be tested, and the process would be blinded,” Saha says. “Because I had to see whether the platform worked or not.”

When Saha and her team ran the mystery meningitis samples through IDseq, though, the result was surprising. Rather than revealing a bacterial cause, as expected, a third of the samples showed signs of the chikungunya virus — specifically, a neuroinvasive strain that was thought to be extremely rare. “At first we thought, It cannot be true!” Saha recalls. “But the moment Joe and I realized it was chikungunya, I went back and looked at the other 200 samples that we had collected around the same time. And we found the virus in some of those samples as well.”

Until recently, chikungunya was a comparatively rare disease, present mostly in parts of Central and East Africa. “Then it just exploded through the Caribbean and Africa and across Southeast Asia into India and Bangladesh,” DeRisi told me. In 2011, there were zero cases of chikungunya reported in Latin America. By 2014, there were a million.

Ordinary chikungunya can cause lasting neurological damage and lifelong joint pain. DeRisi called the disease “hugely devastating” and noted that chikungunya, in the Kimakonde language, spoken in Tanzania, means “to become contorted.” But a neuroinvasive version that caused brain damage and primarily affected children and infants was especially alarming.

Chikungunya is a mosquito-borne virus, but when DeRisi and Saha looked at the results from IDseq, they also saw something else: a primate tetraparvovirus. Primate tetraparvoviruses are almost unknown in humans, and have been found only in certain regions. Even now, DeRisi is careful to note, it’s not clear what effect the virus has on people. “Maybe it’s dangerous, maybe it isn’t,” DeRisi says. “But I’ll tell you what: It’s now on my radar. So this thing that would have been totally invisible, that nobody even knew to look for — now we’re watching for it.”

That sort of discovery matters, Farhad Imam observes, partly because it can head off a new epidemic, but also because it reveals a landscape of potentially dangerous viruses that we would otherwise never find out about. “What we’ve been missing is that there’s an entire universe of pathogens out there that are causing disease in humans,” Imam notes, “ones that we often don’t even know exist.”

Carlos Chavarría for The New York Times

After finishing the meningitis pilot study, DeRisi and Imam started to roll out IDseq more widely. “The plan was, Let’s let researchers around the world propose studies, and we’ll choose 10 of them to start,” DeRisi recalls. “We thought we’d get, like, a couple dozen proposals, and instead we got 350.”

A group in Madagascar wanted to compare the organisms found in bats against those found in patient blood samples, as a way to see what viruses might be spilling over. A research institute in Brazil, which often sees patients with mysterious fevers, wanted to know the cause. “The selling point for researchers is: ‘Look, this technology lets you investigate what’s happening in your clinic, whether it’s kids with meningitis or something else,’” DeRisi said. “We’re not telling you what to do with it. But it’s also true that if we have enough people using this, spread out all around the world, then it does become a global network for detecting emerging pandemics. Because maybe you’re focused on childhood meningitis in Dhaka, but suddenly you have all these adults showing up with a weird respiratory illness. You’re going to turn your attention to that.”

At the lab, DeRisi pulled up the IDseq results for some of Saha’s meningitis samples, drawn from patients’ cerebrospinal fluid. “This is a heat map,” DeRisi said, pointing at what looked like an erratically filled-in grid, with some white squares and others in gradations of yellow or red. At the top, a stretch of dark red-purple blocks showed the presence of chikungunya, but there were also dozens of lighter squares, reflecting everything from secondary infections to garden-variety bacteria that live on the skin. Each row, DeRisi explained, represented a different microbe that the system had detected with the color representing the amount of virus that had been found. Some of these were familiar: Alphapapillomavirus causes warts; Saccharomyces cerevisiae is a fungus found in bread and beer.

Making it possible for countries to do their own metagenomic testing, regularly and in real time, could increase pathogen detection in places where new pandemics are most likely to emerge. But the heat map also showed how hard it can be to determine which organism, out of many, is the one making a person ill. One hazard of metagenomics is that it amplifies all the genetic material in a sample indiscriminately, making it challenging to know which of the various bacteria or viruses the process detects are actually significant. Nasal swabs, for example, routinely pick up signs of influenza and respiratory viruses — as well as dozens, or even hundreds, of types of bacteria. That’s especially true in the parts of the world where DeRisi would like to offer IDseq. As David Relman, a microbiologist at Stanford University, notes, “When you draw blood from someone who has a fever in Ghana, you really don’t know very much about what would normally be in their blood without fever — let alone about other kinds of contaminants in the environment. So how do you interpret the relevance of all the things you’re seeing?”

Such criticisms have led some to say that metagenomics simply isn’t suited to the infrastructure of developing countries. Along with the problem of contamination, many labs struggle to get the chemical reagents needed for sequencing, either because of the cost or because of shipping and customs holdups. Even uploading data can be fraught. “In Cambodia, we have problems with the internet, and we have big problems with power outages,” Bohl says. “So the constant fear is that I’ll wake up after a 48-hour run and there’ll be no information, because the power went out in the middle of the night.”

When I mentioned this to Saha, she said that such conditions were not an argument for limiting access. Imam agrees. “Before now, a researcher would literally have to send their samples to a lab in the global north — to what I call ‘one of the two Cambridges,’ Boston or England — just to answer a question about a disease in their own country,” he says. “So this really represents a change in terms of who has access to metagenomic technology, and what can be done with it.”

Carlos Chavarría for The New York Times

Soon after the first Covid lockdowns began in the United States in March 2020, DeRisi and his group set up Slack channels to talk with IDseq teams around the world, nearly all of which had started using the technology to track coronavirus variants as they emerged. In Cambodia, Bohl’s team sequenced the virus’s genome from a patient who had recently returned from Wuhan — one of the earliest sequences to be posted on Gisaid, an open-access database for disease variants. In Bangladesh, Saha and her group did the same, and discovered a strain with two unfamiliar mutations. “That’s one of the beauties of the system,” DeRisi observes. “It allows you to pivot on a dime.” By that point, the SARS-CoV-2 virus, which causes Covid-19, had been identified using electron microscopy; there was no need to use metagenomic sequencing to find a mystery agent. But as one infectious-disease specialist, David Patrick at the University of British Columbia, told me: “What if that hadn’t worked? It would have been nice to have an extra tool in the kit.”

As the coronavirus spread around the world, the Africa Centers for Disease Control and Prevention, which oversees a continentwide Pathogen Genomics Initiative (P.G.I.), also reached out, hoping to expand the IDseq program to additional labs around the continent. Tato, the researcher who oversaw that process in Senegal, Ethiopia, Egypt and Nigeria, says that while tracking Covid was part of what motivated the Africa C.D.C.’s interest, the expansion was also aimed at ongoing epidemics like yellow fever, Ebola and Lassa fever. (Nigeria’s yellow-fever infections, in particular, were growing more severe, leading researchers to wonder whether the virus had evolved in ways that made it more virulent.) But the P.G.I. also urged countries to begin using metagenomics more broadly — for instance, to investigate the vast repository of samples collected over the years by Dakar’s Institut Pasteur, from patients as well as wildlife and birds.

Even just as a public-health tool, IDseq has the potential to be illuminating. In Nepal, Tato told me, projects are underway to determine the causes of both idiopathic pediatric encephalitis and a mysterious infection that causes blindness in infants and children, which is thought to be transmitted by moths. (The infectious agent carried by the moths — bacteria, fungus or some other toxin — is still unknown.) “They’ve got this new technology, and they’re just running with it,” Tato adds. “They keep finding new things they want to investigate.”

Using IDseq to tackle regional health problems is part of the point, DeRisi says. “Look, most of the stuff that people find with IDseq will never turn into a pandemic,” he went on. “But that doesn’t mean it’s useless. We’ll still be learning what pathogens are out there, how they’re changing, when they’re becoming more dangerous. All of which makes it more likely that we’ll be able to spot an emerging pandemic before it takes off.”

Discovering a contagious disease early makes it easier to contain, but widespread sampling also means that we’re less likely to be caught off-guard. “With Ebola, there’s always an issue: Where’s the virus hiding before it breaks out?” DeRisi explains. “But also, once we start sampling people who are hospitalized more widely — meaning not just people in Northern California or Boston, but in Uganda, and Sierra Leone, and Indonesia — the chance of disastrous surprises will go down. We’ll start seeing what’s hidden.”


Jennifer Kahn is a contributing writer for the magazine and the Narrative Program lead at the U.C. Berkeley Graduate School of Journalism. She last wrote about using drugs to prevent the next pandemic.