My recent work applies mathematical modeling and Bayesian inference methods to study the transmission dynamics of infectious diseases such as influenza, Ebola, and measles. I am also developing forecast systems to predict outbreaks of infectious diseases. In addition, I study how environmental factors influence the transmission of influenza, its seasonality, and the underlying mechanisms. Some recent projects below.

Measles spatial spread across China during 2005−2014

Measles is a highly infectious and severe disease – it is a leading cause of death in children in developing regions, killing 114,900 globally in 2014. Elimination of the disease can nevertheless be achieved with vaccination of 90-95% of a population, as shown in theory and practice. In China, however, measles continues to infect thousands of people each year despite vaccination coverage above 95%. This conundrum challenges measles elimination in China and worldwide. In this study, we characterize the geospatial distribution of measles and epidemic connections among cities across China. Using incidence data reported during 2005-2014 for all 344 cities, we show that the municipal burden of measles differed substantially and some cities were highly connected and experienced synchronous outbreaks. We identify 14 cities that experienced endemic transmission during 2005−2010, and 21 transmission clusters, including 6 cross-regional clusters that link the less developed inland regions and the industrial east. We find that three transmission foci coexist in China—cities with large minority populations, inland cities with more emigrants, and mega industrial cities hosting more immigrants—and that migrant workers, connecting the latter two foci, likely facilitate measles transmission across regions. This complex connection, along with the differing disease burden among cities, renders measles elimination challenging in China despite the high overall vaccination rate. Future immunization programs should therefore target these three foci.

Yang W, Wen L, Li SL, Chen K, Zhang WY, Shaman J. Geospatial characteristics of measles transmission in China during 2005-2014. PLoS Comput Biol. 2017;13(4):e1005474.

Long-term measles epidemic dynamics in China and implications for elimination

Despite high vaccine coverage, measles continues to cause large epidemics in China, a country currently supporting 18% of the world's population. To improve understanding of this phenomenon, in this study, we develop a comprehensive model-inference system; using this system, we are able to simulate measles epidemic dynamics and estimate key epidemiological characteristics in three key locations in China during 1951-2004, a period that spans the pre-vaccine and modern mass-vaccination eras. These estimates include spatiotemporal variations in population susceptibility and the basic reproductive number (R0), an epidemiological parameter commonly used to inform target vaccination levels for measles elimination. Our findings reveal population and epidemiological characteristics crucial for understanding the current persistence of measles epidemics in China and for devising future elimination strategies.

Yang W, Li J, Shaman J. Characteristics of measles epidemics in China (1951-2004) and implications for elimination: A case study of three key locations. PLoS Comput Biol. 2019;15(2):e1006806.

How Ebola spread across Sierra Leone during the 2014-2015 epidemic?

Understanding the growth and spatial expansion of (re)emerging infectious disease outbreaks, such as Ebola and avian influenza, is critical for the effective planning of control measures; however, such efforts are often compromised by data insufficiencies and observational errors. In this study, we develop a novel spatial-temporal inference methodology requiring only limited, readily compiled data and use this method to reconstruct the transmission network of the 2014-15 Ebola epidemic in Sierra Leone. Transmission within the network introduced Ebola to new regions and initiated self-sustaining local epidemics. Two major transmission pathways are inferred, facilitated by two of the more populous and connected districts, Kenema and Port Loko. Epidemic intensity differed by district, correlated with population size, and a critical window of opportunity for containing local Ebola epidemics at the source (ca. one month) existed. This novel methodology can be used to help identify and contain the spatial expansion of future (re)emerging infectious disease outbreaks.

Yang W, Zhang W, Kargbo D, Yang R, Chen Y, Chen Z, Kamara A, Kargbo B, Kandula S, Karspeck A, Liu C, Shaman J. 2015. Transmission network of the 2014-2015Ebola epidemic in Sierra Leone. Journal of the Royal Society Interface. Published 11 November 2015. DOI: 10.1098/rsif.2015.0536.

Inference of Seasonal and pandemic flu transmission using big data online surveillance

Infectious disease surveillance systems are powerful tools for monitoring and understanding infectious disease dynamics; however, underreporting (due to both unreported and asymptomatic infections) and observation errors in these systems create challenges for delineating a complete picture of infectious disease epidemiology. This issue is true for influenza, an infectious disease of pandemic potential. In this study, we develop influenza inference systems capable of compensating for observational biases and underreporting. Using both Google Flu Trends and CDC data in conjunction with new model-inference methods, we are able to infer the evolving epidemiological features of influenza and its impacts among the large U.S. population during 2003-2013, including the 2009 pandemic. In addition, differences among regions within the U.S. are identified.

Yang W, Lipsitch M, Shaman J. 2015. Inference of seasonal and pandemic influenza transmission dynamics. Proceedings of the National Academy of Sciences 112: 2723-2728.

Forecast methods for the flu: which one is better?

Influenza, or the flu, is a significant public health burden in the U.S. that annually causes between 3,000 and 49,000 deaths. Predictions of influenza, if reliable, would provide public health officials valuable advanced warning that could aid efforts to reduce the burden of this disease. For instance, medical resources, including vaccines and antivirals, can be distributed to areas in need well in advance of peak influenza incidence. Recent applications of statistical filtering methods to epidemiological models have shown that accurate and reliable influenza forecast is possible; however, many filtering methods exist, and the performance of any filter may be application dependent. In this study, we use a single epidemiological modeling framework to test the performance of six state-of-the-art filters for modeling and forecasting influenza. Three of the filters are particle filters, commonly used in scientific, engineering, and economic disciplines; the other three filters are ensemble filters, frequently used in geophysical disciplines, such as numerical weather prediction. We use each of the six filters to retrospectively model and forecast seasonal influenza activity during 2003-2012 for 115 cities in the U.S. We compare the performance of the six filters and propose potential strategies for improving real-time influenza prediction.

Yang W, Karspeck A, Shaman J. 2014 Comparison of filtering methods for the modeling and retrospective forecasting of influenza epidemics. PLoS Comput Biol 10: e1003583

Forecast of flu outbreaks in subtropical regions (e.g. Hong Kong)

Unlike the U.S., where flu season arrives regularly in winter, influenza epidemics in subtropical and tropical regions occur throughout the year. This irregularity creates challenges for the forecast system as applied to U.S. cities. In this work, we develop alternative forecast systems that are more adept at handling erratic non-seasonal epidemics, using either the ensemble adjustment Kalman filter or a particle filter with space reprobing, in conjunction with a susceptible-infected-recovered model. We present these forecast systems and apply them to Hong Kong.

The forecast systems are able to forecast both the peak timing and peak magnitude for 44 epidemics during 1998-2013, as caused by individual influenza strains, as well as 19 aggregate epidemics, as caused by all strains. Forecast accuracy is comparable to that achieved for U.S. cities. For peak timing (peak magnitude) forecast accuracy increases up to 43% (45%) for H1N1, 93% (89%) for H3N2, and 53% (68%) for influenza B, 1-3 weeks before the predicted peak. These findings indicate that these forecasts provide lead times adequate for planning intervention measures. In addition, the forecasts of peak magnitude can be used to inform the scale of response. For instance, the amount of antivirals and vaccines needed could be assessed based on the predicted peak magnitude. Altogether, our results suggest that routine forecast of influenza epidemics in other subtropical and tropical regions is possible, as well as forecast of other infectious diseases sharing similar irregular transmission dynamics.

Yang W, Cowling BJ, Lau EHY, Shaman J. 2015. Forecasting influenza epidemics in Hong Kong. PLoS Computational Biology 11: e1004383.

Forecast flu outbreaks in boroughs and neighborhoods of New York City

Recently developed influenza forecast systems have the potential to aid public health planning for and mitigation of the burden of this disease. However, current forecasts are often generated at spatial scales (e.g. national level) that are coarser than the scales at which public health measures and interventions are implemented (e.g. community level). In this study, we build and test influenza forecast systems at county and community levels, which either include spatial connectivity among locations or are run in isolation. We test these four flu forecast systems (i.e. 2 models × 2 spatial scales) using data collected from 2008 to 2013, including the 2009 pandemic, for the five boroughs (corresponding to county level) and 42 neighborhoods (corresponding to community level) in New York City. We compare the performance of the four forecast systems in predicting the onset, duration, and intensity of flu outbreaks and found that the performance varied by spatial scale (borough vs. neighborhood), season (non-pandemic vs. pandemic) and metric (onset, duration, and intensity). In general, the inclusion of spatial network connectivity in the forecast model improves forecast accuracy at the borough scale but degrades accuracy at the neighborhood scale.

Yang W, Olson DR, Shaman J (2016) Forecasting Influenza Outbreaks in Boroughs and Neighborhoods of New York City. PLoS Comput Biol 12(11): e1005201. doi:10.1371/journal. pcbi.1005201.

Inference and forecast of Ebola outbreaks in Guinea, Liberia, and Sierra Leone

The 2014-15 Ebola epidemic in three West African countries—Guinea, Liberia, and Sierra Leone, is the largest on record. WHO declared the outbreak a Public Health Emergency of International Concern on 8 August 2014. We started to work on the project in early August 2014. Since then, we have tested a variety of epidemic models for Ebola transmission and used them to forecast the epidemic in the three countries. Near real-time weekly forecasts have been posted on our forecast website ( since October 2014. We have also studied the epidemiological characteristics of the current Ebola epidemic, including the basic reproductive number, incubation period, and infectious period.

Shaman J, Yang W, Kandula S. Inference and forecast of the current West African Ebola outbreak in Guinea, Sierra Leone and Liberia. PLOS Currents: Outbreaks Oct 31, Edition 1 (2014). doi: 10.1371/currents.outbreaks.3408774290b1a0f2dd7cae877c8b8ff6.

More Past Work


Dr. Wan Yang
Department of Epidemiology
Mailman School of Public Health
Columbia University
722 West 168th Street
Rosenfield Building, Rm 520
New York, NY 10032
Email: wy2202 at

Other Web Profiles:

Columbia Profile
Google Scholar
Wan's blog