Complete genome characterisation of a novel coronavirus associated with severe human respiratory disease in Wuhan, China, bioRxiv, 2020-01-26

Emerging and re-emerging infectious diseases, such as SARS, MERS, Zika and highly pathogenic influenza present a major threat to public health1–3. Despite intense research effort, how, when and where novel diseases appear are still the source of considerable uncertainly. A severe respiratory disease was recently reported in the city of Wuhan, Hubei province, China. At the time of writing, at least 62 suspected cases have been reported since the first patient was hospitalized on December 12nd 2019. Epidemiological investigation by the local Center for Disease Control and Prevention (CDC) suggested that the outbreak was associated with a sea food market in Wuhan. We studied seven patients who were workers at the market, and collected bronchoalveolar lavage fluid (BALF) from one patient who exhibited a severe respiratory syndrome including fever, dizziness and cough, and who was admitted to Wuhan Central Hospital on December 26th 2019. Next generation metagenomic RNA sequencing4 identified a novel RNA virus from the family Coronaviridae designed WH-Human-1 coronavirus (WHCV).Phylogenetic analysis of the complete viral genome (29,903 nucleotides) revealed that WHCV was most closely related (89.1% nucleotide similarity similarity) to a group of Severe Acute Respiratory Syndrome (SARS)-like coronaviruses (genus Betacoronavirus, subgenus Sarbecovirus) previously sampled from bats in China and that have a history of genomic recombination. This outbreak highlights the ongoing capacity of viral spill-over from animals to cause severe disease in humans.

biorxiv pathology 100-200-users 2020

Fine-scale genomic analyses of admixed individuals reveal unrecognized genetic ancestry components in Argentina Native American, African and European genetic ancestries in Argentina, bioRxiv, 2020-01-25

AbstractWe are at the dawn of the efforts to describe and understand the origins of genetic diversity in Argentina from high-throughput data. This knowledge is a primary step in the intent of deciphering the specific genetic bases of diseases and drug response in the country. Similarly to other populations across the Americas, genetic ancestry in Argentinean populations traces back into African, European and Native American ancestors, reflecting a complex demographic history with multiple migration and admixture events in pre- and post-colonial times. However, little is known about the sub-continental origins of these three main ancestries. We present new high-throughput genotyping data for 87 admixed individuals across Argentina. This data was combined to previously published data for admixed individuals in the region and then compared to different reference panels specifically built to run population structure analyses at a sub-continental level. Concerning the European and African ancestries, we confirmed previous results about their main origins, and we provide new insights into the presence of other origins that reflect historical records. As for the Native American ancestry, leveraging genotype data for archaeological samples in the region in order to gain temporal depth in our analyses, we could identify four Native American components segregating in modern Argentinean populations. Three of them are also found in modern South American populations and are specifically represented in Central ChilePatagonia, Lowlands and Central Andes geographic areas. The fourth one may be specific to the Central Western region of Argentina.Identifying such component has not been straightforward since it is not well represented in any genomic data from the literature. Altogether, we provide useful insights into the multiple population groups from different continents that have contributed to present-days genetic diversity in Argentina. We encourage the generation of massive genotype data locally to further describe the genetic structure in Argentina.Author SummaryThe human genetic diversity in Argentina reflects demographic mechanisms during which the European colonists invaded a territory where Native American populations were settled. During colonial period, the slave trade also prompted many African people to move to Argentina. Little is known about the origins of the Native American and African components in Argentinean populations nowadays.Genotyping data for 87 admixed individuals throughout Argentina was generated and data from the literature was re-analyzed to shed light on this question. We confirmed that most of the European genetic ancestry comes from the South, although several individuals are related to Northern Europeans. We found that African origins in Argentina trace back from different regions. As for the Native American ancestry, we identified that it can be divided into four main components that correspond to Central ChilePatagonia, Lowlands, Central Andes and Central Western region of Argentina. In order to understand the specificity of the genetic diversity in Argentina, we should not rely on knowledge generated in other populations. Instead, more effort is required to generate specific massive genomic knowledge at the local level.

biorxiv genetics 0-100-users 2020

Low-N protein engineering with data-efficient deep learning, bioRxiv, 2020-01-24

AbstractProtein engineering has enormous academic and industrial potential. However, it is limited by the lack of experimental assays that are consistent with the design goal and sufficiently high-throughput to find rare, enhanced variants. Here we introduce a machine learning-guided paradigm that can use as few as 24 functionally assayed mutant sequences to build an accurate virtual fitness landscape and screen ten million sequences via in silico directed evolution. As demonstrated in two highly dissimilar proteins, avGFP and TEM-1 β-lactamase, top candidates from a single round are diverse and as active as engineered mutants obtained from previous multi-year, high-throughput efforts. Because it distills information from both global and local sequence landscapes, our model approximates protein function even before receiving experimental data, and generalizes from only single mutations to propose high-functioning epistatically non-trivial designs. With reproducible >500% improvements in activity from a single assay in a 96-well plate, we demonstrate the strongest generalization observed in machine-learning guided protein design to date. Taken together, our approach enables efficient use of resource intensive high-fidelity assays without sacrificing throughput. By encouraging alignment with endpoint objectives, low-N design will accelerate engineered proteins into the fermenter, field, and clinic.

biorxiv synthetic-biology 0-100-users 2020

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo