Zika Virus (Zikv) Discovered
Zika virus (ZIKV), discovered in 1947, caused only mild and sporadic disease throughout Africa and Asia until the 2007 Micronesia(1) and 2013 French Polynesia outbreaks(2). Since its introduction to the Americas,(3, 4), it has spread rapidly into 84 countries and territories to date, and has raised global concern regarding the molecular evolution of this virus(5). The possible increase in severe pathogenicity including fetal microcephaly(2, 6, 7), and the novel transmission patterns(8, 9). In vivo and in vitro experiments have confirmed that ZIKV is capable of causing microcephaly and other neurologic damage(10-13). While the factors behind ZIKV’s explosive emergence are still unknown, it has been hypothesized that the virus may have recently evolved to become more neurotropic(14, 15), to exhibit increased replicative capacity, and/or to become more transmissible to humans. Therefore, in order to distinguish between the relative contributions of these potential mechanisms, and to help prepare for future outbreaks, we need to have accurate understanding of ZIKV evolution at a molecular level.
Previously studies have used phylogenetic tools and determined the genetic difference between the Asian/American (AS) and African (AF) ZIKV lineages(10, 16-19). Recent work by Faria et al has shown that the ZIKV strains found in Americas in late 2013 and early 2014 were found to be highly related to the strains isolated in the Pacific Islands(20). Over 400 strains of ZIKV has been sequenced (as of July 2017). Amongst these sequences, 50 human and mosquito strains were isolated and sequenced from local transmission cases Miami, Florida(3). Unlike Florida, ZIKV samples collected in New York City (NYC), New York, mostly consist of imported patient samples from international travelers; there have been no documented cases of local mosquito-transmission. This allows us to examine NYC as a regional hub for the rapid importation of ZIKV carrying patients.
In this study, we studied the single nucleotide polymorphisms (SNPs) from all available sequence database (as of July 2017), as a time/geographical biomarker. Our analysis revealed a potential tool that can be used to genotype ZIKV-infected patient samples directly. We then confirmed this method using patient’s samples obtained from the Department of Health and Mental Hygiene (DOHMH) in NYC. In addition, quasi-species composition in ZIKV infected patients have been recently published(21, 22). To expand this field of study, we propose a working pipeline to study the single nucleotide variations within next-generation sequencing (NGS) data. Evidence from the NGS data can provide more accurate tracing of ZIKV quasi-species, and the evolution of the virus.
Phylogenic analysis reveals prolonged ZIKV transmission in Southeast Asia
We performed phylogenetic analysis of the complete genomes of 432 ZIKV strains to date and identified three major lineages, African (AF), Asian (AS) and American (AM) (Fig. 1A). We then compared consensus sequences from each lineage and found that while most amino acid residues are conserved between AS and AM strains, three positions– 139, 2086, and 2634 – were not (Fig. 1B). To further characterize AS strains, we constructed the phylogenetic trees based on the ZIKV prM and E genes (Figs 1C and S1). We found that the strains from Singapore (ZKA-16–097 and ZKA-16-291), Thailand (BKK01 and SI_BKK01), and China (ZK_YN001) stemmed from a common AS ancestral branch, distinct from the recent AM strains (Fig. 1C). Surprisingly, we found that a strain isolated from a Russian traveler to India (D305/2016/Russia) shared identical amino acid sequence and 98% nucleotide sequence homology in the PrM protein, including the V153 mutation, with the P6_740 strain isolated from a Malaysian mosquito in 1966 (Fig. 1D).
Comprehensive SNP typing of pandemic ZIKV
Evolution of ZIKV occurs through the accumulation of mutations that are either inherited from its parent strain or occur de novo. From these two distinct sources of mutation, we can infer the evolutionary relationships between ZIKV strains through genotyping (see Methods). To define the “inherited” SNPs (inSNPs) of AM strains, we analyzed the open reading frames (ORFs, nucleotide positions 108-10379) of 391 complete ZIKV genomes, including three newly sequenced strains. InSNPs were recognized if they were present in at least 10 independently isolated strains (Fig. 2A and S2). Within the AM lineage, our genome-wide scan revealed 203 inSNPs, which can be classified based on their genetic homology into the six specific groups shown in the heatmap. Our classification method is consistent with phylogenetic analyses using the GRT model PhyML (Fig. S3).
To characterize the variability of ZIKV strains, we analyzed the genome sequences isolated from two groups of Chinese travelers who returned from American Samoa or Venezuela with confirmed ZIKV infection, which were grouped into G1 and G6, respectively (Fig. 2B and S4). We found that the G1 strains matched each other with 99.72% sequence identity (Table S1). In contrast, G6 strains were divided into two subgroups based on two independent sets of shared mutations (Fig. 2B). Interestingly, the travelers from the two G6 subgroups had returned to China from Venezuela during two separate time periods, the beginning and the end of February 2016. It is therefore possible that these divergent strains were co-circulating in the same region (Fig. S4). In total, we have identified 203 inSNPs and 15 amino acid substitutions compared to the reference strain (Figs. 2C and S5). We observed that G1 has the most significant sequence changes compared to the reference strain and to the other five groups, with most mutations occurring in the NS1 and NS5 genes.
Global and regional spread of ZIKV
To examine the global spread of ZIKV from Asia to the Americas, we constructed a map depicting the location of infection by different ZIKV strains (Fig. 3A). G1 has so far only been detected in the Pacific Islands (e.g. Samoa, Fiji). G2 was likely the first to have entered Brazil, and subsequently diverged into G3-6 while spreading across the Americas. Indeed, G2 has been isolated mainly in northeastern Brazil, while G3 is present in Central America (e.g., southern Mexico, El Savador) and G4 mainly circulates in the Caribbean and North America. By contrast, G5 and G6 were found to have a much broader geographical distribution. G0 was isolated in Southeast Asia and appears to not have significantly diverged from older AM strains. We hypothesized that genomic variations also accumulated over time, therefore we plotted the time of isolation the strains from each country (Fig 3B). Our results revealed the time-dependent spread of ZIKV from G1 to G4. The mutation rate of ZIKV since H/PF/2013 for G1-6 is 1.137×10-4 SNP/year (Fig 3C).
A rapid method to SNP-type new strains
ZIKV was first reported in the continental United States in July 2016.(23). Nearly 300 local and 5000 imported cases of ZIKV infection have been confirmed as of April 26, 2017. Based on our analysis of the inSNPs from each group, we identified a region of the ZIKV genome (nt 3247-4149) that could be readily amplified and sequenced to classify new strains into our established groups. (Figs. 4A and S6). The relevant inSNP positions for each group are detailed in figure 4A. To test this genotyping method, we successfully sequenced 100 samples obtained from the NYC DOHMH. The original patient population was weighted towards travelers returning from the Dominican Republic (60/100) and Puerto Rico (13/100). Our results confirm that multiple divergent strains are currently circulating in the Caribbean islands, with most samples belonging to G4 (Fig 4B). Samples isolated from travelers returning from South or Central America are consistent with the groups identified in those regions. Interestingly, the three ZIKV-infected travelers from Honduras carried G3 ZIKV.
NGS reveals quasi-species evolution in humans and mosquitos
To better understand how a population of ZIKV clones evolve from within a host, we examined the open-sourced NGS database shared by the Anderson Lab(3). We developed a quasi-species distribution analysis pipeline using the consensus sequence for each sample and a 10% variance cut-off. For an inter-host and inter-species comparison of quasi-SNPs, we analyzed both the human (n=10) and mosquito (n=5) ZIKV isolates from Miami, Florida (Fig S7). We revealed that on average a human sample contains 12.5 quasi-SNPs/sample, which is greater compared to the 4.8 quasi-SNPs/sample seen in the mosquito (Fig 5A). The rates for isolating a unique quasi-SNP were 6.50×10-4 and 3.30×10-4 in humans and mosquitos, respectively. This is consistent with previous findings suggesting a limited ZIKV replication cycle within the mosquito gut (24, 25). Furthermore, we compared the quasi-SNPs against the full-length ZIKV genome sequences in each of the six groups (Fig 5B). We found two quasi-SNPs, T1064C and T1541G, were unique in the sequences from isolated from the U.S. Interestingly, we found that 4/15 samples contain adenine in position 5315, which can also be found in sequences from G1, 2, 4 and 5. A5315G mutation can be found in G3 and 6 with adenine at position. Samples carrying a mutation at G9344A was only found in G2 and 6.
Rapid analysis of the genetic variation and evolutionary relationships between ZIKV strains is critical to early epidemiological investigations. We found that all pandemic ZIKV strains carry particular inSNPs that can be used as molecular footprints to track their spread. This information allowed us to determine patterns of viral evolution and develop potential tools to classify ZIKV strains.
Our earlier ZIKV genome sequence analyses revealed that strains in the American outbreak were likely derived from the Asian rather than African lineage(16). However, it was unclear how old AS strains such as P6_740 could spread to the Americas a half-century later and cause congenital malformations that had not been previously reported. Through a comprehensive analysis of the complete and partial DNA sequences of the ZIKV strains isolated in Asia, we developed an improved understanding of the evolutionary history of that lineage. First, we noted that ZIKV strains isolated from Asia were more genetically divergent than the strains found in the Americas, implying that multiple series of mutations had occurred before the migration. While strains such as P6_740 resembled the AF lineage, there were already strains such as NIID123/2016/Japan that resembled the AM lineage. Second, we found an identical amino acid sequence in the PrM region shared between the P6_740 strain and the ZIKV isolated from a traveler to India in 2016. This suggests that ZIKV may have been continuously circulating in Asia over the past half-century. Third, most of the ZIKV strains in Asia are thought to cause only mild symptoms, which could explain why their continuous circulation has gone unnoticed for decades. However, two recent cases of ZIKV-associated congenital microcephaly in Thailand indicate that some AS strains may now be evolving to higher pathogenicity. Our analysis has identified multiple sequence differences between the published AS strains. In the future, it may be possible to correlate pathogenicity with particular mutations, although factors such as previous exposure to other flaviviruses may also contribute to disease severity(26).
ZIKV sequences isolated from several groups of Chinese travelers provided us with another opportunity to test our SNP tracking system. The group that traveled to American Samoa and returned to China together were infected by ZIKV carrying a molecular signature that we classified as belonging to G1. Their ZIKV sequences were also very similar, strongly suggesting that they had been infected by the same strain. In contrast, another group traveled to Venezuela together but either returned to China at the beginning or end of February 2016. Their ZIKV strains carried a signature matching the G6 group, yet there were also distinct sequence differences that correlated with the date of their departure. Therefore, SNP-typing can not only help identify the geographic region where imported ZIKV cases originated, but provide clues regarding the time of infection when multiple strains are in circulation. When we expanded our analysis to sequences published worldwide, we were able to distinguish the regional spread of specific genotypes of ZIKV. Our system provides useful information to further investigate the virus’s adaptations to particular regions, and identify patterns of its migration for early epidemiological control.
We have identified 203 SNPs in the 391 ZIKV strains isolated from the recent outbreaks, and developed a SNP-typing strategy to classify them into 6 distinct groups. This approach has allowed us to develop an easy-to-use method to quickly genotype isolated ZIKV strains using specific primers for Sanger sequencing, without the need for full-genome sequencing. This method can be effectively used in limited settings using only basic equipment and at only a fraction of the cost. The samples collected in NYC, of which over 50% came from travelers to the Dominican Republic, were confirmed by our analysis to have a majority of G4 strains, which are the group predominantly circulating in the Caribbean. We acknowledge that these samples may be biased towards this group, as NYC has a large Dominican immigrant population. However, as more samples are collected in the future, we expect the groupings identified by our system will reflect the patterns of ZIKV migration as it spreads to new areas. Thus, our system can be used to facilitate the interpretation of sequence and surveillance data, and provides a framework from which new evolutionary branches could continue to be defined as our sample size grows. We anticipate that genotyping of ZIKV will become increasingly useful once molecular signatures can be combined with functional models to determine the clinical outcomes associated with each group.
We further sought to expand our current understanding of ZIKV evolution using a NGS dataset. The sequencing data from strains isolated in Florida allowed us to make the first inter-host and inter-species comparisons between different ZIKV strains. Our findings revealed important host-specific dynamics in viral replication. The limited number of quasi-SNPs in mosquitos suggest a low replication cycle, which may explain the reappearance of the virus in the recent epidemic with few genomic mutations. Our analysis also confirmed that ZIKV was introduced to the U.S. in several separate incidents rather than a single source. These results further highlight the important role SNPs play in both tracking and understanding the evolution of ZIKV virus.