Abstract: The goal of ConronaVis is to use tweets as the information shared by the people to visualize topic modeling, study subjectivity and to model the human emotions during the COVID-19 pandemic. The main objective is to explore the psychology and behavior of the societies at large which can assist in managing the economic and social crisis during the ongoing pandemic as well as the after-effects of it. The novel coronavirus (COVID-19) pandemic forced people to stay at home to reduce the spread of the virus by maintaining the social distancing. However, social media is keeping people connected both locally and globally. People are sharing information (e.g. personal opinions, some facts, news, status, etc.) on social media platforms which can be helpful to understand the various public behavior such as emotions, sentiments, and mobility during the ongoing pandemic. In this paper, we describe the CoronaVis Twitter dataset (focused on the United States) that we have been collecting from early March 2020. The dataset is available to the research community at https://github.com/mykabir/COVID19. We would like to share this data with the hope that it will enable the community to find out more useful insights and create different applications and models to fight with COVID-19 pandemic and the future pandemics as well....
Authors: Zhang Hui, Sun Xun, Lu Ya, Wu Jianzhong, Feng Jifeng
Publication date: April 27, 2020
Abstract: Colorectal cancer (CRC) is characterized by the accumulation of genetic and epigenetic alterations in neoplastic processes. DNA methylation, as an important epigenetic process, contributes to the development of CRC. In the present study, the epigenetic landscape of genes in CRC was characterized by analyzing the dataset from The Cancer Genome Atlas database and 177 DNA-methylated genes were screened based on the criterion of the Pearson correlation (R) between expression and methylation levels being >0.4. Pathway enrichment analysis revealed prominent pathways, including transcription and metabolism, further implying their significant role in tumorigenesis. Among the methylated genes, only zinc finger protein (ZNF)726 with aberrant expression was determined to affect overall survival (OS) as well as disease-free survival of patients with CRC. In addition, ZNF726 was identified as an independent prognostic risk factor for OS in patients with CRC. The methylation-based regulation of ZNF726 expression in CRC cells was further assessed using the Cancer Cell Line Encyclopedia database. Finally, the CpG island methylation of the ZNF726 promoter was evaluated to further elucidate its role in the development of CRC. In conclusion, the epigenetic landscape of genes in terms of promoter methylation in CRC was characterized, revealing that aberrant expression of ZNF726 may be an independent prognostic risk factor for OS in patients with CRC....
Abstract: The ability to detect the SARS-CoV-2 in a widespread epidemic is crucial for screening of carriers and for the success of quarantine efforts. Methods based on real-time reverse transcription polymerase chain reaction (RT-qPCR) and sequencing are being used for virus detection and characterization. However, RNA viruses are known for their high genetic diversity which poses a challenge for the design of efficient nucleic acid-based assays. The first SARS-CoV-2 genomic sequences already showed novel mutations, which may affect the efficiency of available screening tests leading to false-negative diagnosis or inefficient therapeutics. Here we describe the CoV2ID (http://covid.portugene.com/), a free database built to facilitate the evaluation of molecular methods for detection of SARS-CoV-2 and treatment of COVID-19. The database evaluates the available oligonucleotide sequences (PCR primers, RT-qPCR probes, etc.) considering the genetic diversity of the virus. Updated sequences alignments are used to constantly verify the theoretical efficiency of available testing methods. Detailed information on available detection protocols are also available to help laboratories implementing SARS-CoV-2 testing....
Abstract: The European Association of Environmental and Resource Economists started around 1990 and celebrates its 30th anniversary this year. The rise of environmental concerns and the wish for more cooperation between scientists within Europe, plus the drive of a few highly motivated people, led to the foundation of this academic institution. This article aims at clarifying the initial steps and the development of this highly successful association. The relationship between economics and the environment is core for the future of our world, and the EAERE was crucial in developing this field. The EAERE has been a stimulus and a home for many scientists who were interested to work in this field and who would otherwise have been quite isolated. The future of the EAERE is bright if it manages to bridge new developments in economics and in the natural sciences, and between academics and policy....
Abstract: A novel coronavirus was reported in Wuhan, China in December 2019 to cause severe acute respiratory symptoms (COVID- 19). In this meta-analysis, we estimated case fatality rate from COVID- 19 infection by random effect meta-analysis model with country level data. Publicly accessible web database WorldOMeter (https://www.worldometers.info/coronavirus/) was accessed on 24th March 2020 GMT and reported total number of cases, total death, active cases and seriously ill/ critically ill patients were retrieved. Primary outcome of this meta-analysis was case fatality rate defined by total number of deaths divided by total number of diagnosed cases. Pooled case fatality rate (95% CI) was 1.78 (1.34- 2.22) %. Between country heterogeneity was 0.018 (p<0.0001). Pooled estimate of composite poor outcome (95% CI) was 4.06 (3.24- 4.88) % at that point of time after exclusion of countries reported small number of cases. Pooled mortality rate (95% CI) was 33.97 (27.44- 40.49) % amongst closed cases (where patients have recovered or died) with. Meta regression analysis identified statistically significant association between health expenditure and mortality amongst closed cases (p=0.037)....
Authors: Zin Phyo Phyo Kyaw, Williams Gavin, Fourches Denis
Publication date: April 10, 2020
Abstract: We report on a new cheminformatics enumeration technology—SIME, synthetic insight-based macrolide enumerator—a new and improved software technology. SIME can enumerate fully assembled macrolides with synthetic feasibility by utilizing the constitutional and structural knowledge extracted from biosynthetic aspects of macrolides. Taken into account by the software are key information such as positions in macrolide structures at which chemical components can be inserted, and the types of structural motifs and sugars of interest that can be synthesized and incorporated at those positions. Additionally, we report on the chemical distribution analysis of the newly SIME-generated V1B (virtual 1 billion) library of macrolides. Those compounds were built based on the core of the Erythromycin structure, 13 structural motifs and a library of sugars derived from eighteen bioactive macrolides. This new enumeration technology can be coupled with cheminformatics approaches such as QSAR modeling and molecular docking to aid in drug discovery for rational designing of next generation macrolide therapeutics with desirable pharmacokinetic properties....
Authors: Italian Civil Protection Department, Morettini Micaela, Sbrollini Agnese, Marcantoni Ilaria, Burattini Laura
Publication date: April 10, 2020
Abstract: The database here described contains data of integrated surveillance for the “Coronavirus disease 2019” (abbreviated as COVID-19 by the World Health Organization) in Italy, caused by the novel coronavirus SARS-CoV-2. The database, included in a main folder called COVID-19, has been designed and created by the Italian Civil Protection Department, which currently manages it. The database consists of six folders called ‘aree’ (containing charts of geographical areas interested by containment measures), ‘dati-andamento-nazionale’ (containing data relating to the national trend of SARS-CoV-2 spread), ‘dati-json’ (containing data that summarize the national, provincial and regional trends of SARS-CoV-2 spread), ‘dati-province’ (containing data relating to the provincial trend of SARS-CoV-2 spread), ‘dati-regioni’ (containing data relating to the regional trend of SARS-CoV-2 spread) and ‘schede-riepilogative’ (containing summary sheets relating to the provincial and regional trends of SARS-CoV-2 spread). The Italian Civil Protection Department daily receives data by the Italian Ministry of Health, analyzes them and updates the database. Thus, the database is subject to daily updates and integrations. The database is freely accessible (CC-BY-4.0 license) at https://github.com/pcm-dpc/COVID-19. This database is useful to provide insight on the spread mechanism of SARS-CoV-2, to support organizations in the evaluation of the efficiency of current prevention and control measures, and to support governments in the future prevention decisions....
Abstract: The Russian Drug Reaction Corpus (RuDReC) is a new partially annotated corpus of consumer reviews in Russian about pharmaceutical products for the detection of health-related named entities and the effectiveness of pharmaceutical products. The corpus itself consists of two parts, the raw one and the labelled one. The raw part includes 1.4 million health-related user-generated texts collected from various Internet sources, including social media. The labelled part contains 500 consumer reviews about drug therapy with drug- and disease-related information. Labels for sentences include health-related issues or their absence. The sentences with one are additionally labelled at the expression level for identification of fine-grained subtypes such as drug classes and drug forms, drug indications, and drug reactions. Further, we present a baseline model for named entity recognition (NER) and multi-label sentence classification tasks on this corpus. The macro F1 score of 74.85% in the NER task was achieved by our RuDR-BERT model. For the sentence classification task, our model achieves the macro F1 score of 68.82% gaining 7.47% over the score of BERT model trained on Russian data. We make the RuDReC corpus and pretrained weights of domain-specific BERT models freely available at https://github.com/cimm-kzn/RuDReC...
Authors: Herrera Giovanny, Barragán Natalia, Luna Nicolás, Martínez David, De Martino Frasella, Medina Julián, Niño Sergio, Páez Luisa, Ramírez Angie, Vega Laura, Velandia Valeria, Vera Michelle, Zúñiga María Fernanda, Bottin Marius Jean, Ramírez Juan David...
Publication date: April 3, 2020
Abstract: The Americas have an elevated number of leishmaniasis cases (accounting for two-thirds of the worldwide disease burden) and circulating Leishmania species, and are therefore of interest in terms of epidemiological surveillance. Here, we present a systematic review of Leishmania parasite species circulating in the countries of the American continent, together with complementary information on epidemiology and geospatial distribution. A database was built from data published between 1980 and 2018 on Leishmania species identified in most of the American countries. A total of 1499 georeferenced points were extracted from published articles and subsequently located to 14 countries in the Americas. This database could be used as a reference when surveilling the occurrence of Leishmania species in the continent....
Abstract: Studying the ecology of photosynthetic microeukaryotes and prokaryotic cyanobacterial communities requires molecular tools to complement morphological observations. These tools rely on specific genetic markers and require the development of specialised databases to achieve taxonomic assignment. We set up a reference database, called µgreen-db, for the 23S rRNA gene. The sequences were retrieved from generalist (NCBI, SILVA) or Comparative RNA Web (CRW) databases, in addition to a more original approach involving recursive BLAST searches to obtain the best possible sequence recovery. At present, µgreen-db includes 2,326 23S rRNA sequences belonging to both eukaryotes and prokaryotes encompassing 442 unique genera and 736 species of photosynthetic microeukaryotes, cyanobacteria and non-vascular land plants based on the NCBI and AlgaeBase taxonomy. When PR 2 /SILVA taxonomy is used instead, µgreen-db contains 2,217 sequences (399 unique genera and 696 unique species). Using µgreen-db, we were able to assign 96% of the sequences of the V domain of the 23S rRNA gene obtained by metabarcoding after amplification from soil DNA at the genus level, highlighting good coverage of the database. µgreen-db is accessible at http://microgreen-23sdatabase.ea.inra.fr ....
Authors: Wagner Alex H., Walsh Brian, Mayfield Georgia, Tamborero David, Sonkin Dmitriy, Krysiak Kilannin, Deu-Pons Jordi, Duren Ryan P., Gao Jianjiong, McMurry Julie, Patterson Sara, del Vecchio Fitz Catherine, Pitel Beth A., Sezerman Ozman U., Ellrott Kyle, Warner Jeremy L., Rieke Damian T., Aittokallio Tero, Cerami Ethan, Ritter Deborah I., Schriml Lynn M., Freimuth Robert R., Haendel Melissa, Raca Gordana, Madhavan Subha, Baudis Michael, Beckmann Jacques S., Dienstmann Rodrigo, Chakravarty Debyani, Li Xuan Shirley, Mockus Susan, Elemento Olivier, Schultz Nikolaus, Lopez-Bigas Nuria, Lawler Mark, Goecks Jeremy, Griffith Malachi, Griffith Obi L., Margolin Adam A....
Publication date: April 3, 2020
Abstract: Precision oncology relies on accurate discovery and interpretation of genomic variants, enabling individualized diagnosis, prognosis and therapy selection. We found that six prominent somatic cancer variant knowledgebases were highly disparate in content, structure and supporting primary literature, impeding consensus when evaluating variants and their relevance in a clinical setting. We developed a framework for harmonizing variant interpretations to produce a meta-knowledgebase of 12,856 aggregate interpretations. We demonstrated large gains in overlap between resources across variants, diseases and drugs as a result of this harmonization. We subsequently demonstrated improved matching between a patient cohort and harmonized interpretations of potential clinical significance, observing an increase from an average of 33% per individual knowledgebase to 57% in aggregate. Our analyses illuminate the need for open, interoperable sharing of variant interpretation data. We also provide a freely available web interface ( search.cancervariants.org ) for exploring the harmonized interpretations from these six knowledgebases....
Abstract: Research on smart grid technologies is expected to result in effective climate change mitigation. Non-Intrusive Load Monitoring (NILM) is seen as a key technique for enabling innovative smart-grid services. By breaking down the energy consumption of households and industrial facilities into its components, NILM techniques provide information on present appliances and can be applied to perform diagnostics. As with related Machine Learning problems, research and development requires a sufficient amount of data to train and validate new approaches. As a viable alternative to collecting datasets in buildings during expensive and time-consuming measurement campaigns, the idea of generating synthetic datasets for NILM gain momentum recently. With SynD, we present a synthetic energy dataset with focus on residential buildings. We release 180 days of synthetic power data on aggregate level (i.e. mains) and individual appliances. SynD is the result of a custom simulation process that relies on power traces of real household appliances. In addition, we present several case studies that demonstrate similarity of our dataset and four real-world energy datasets....
Authors: Monteiro Miguel, Reino Luís, Schertler Anna, Essl Franz, Figueira Rui, Ferreira Maria Teresa, Capinha César
Publication date: April 1, 2020
Abstract: Abstract Background Human activities are allowing the ever-increasing dispersal of taxa to beyond their native ranges. Understanding the patterns and implications of these distributional changes requires comprehensive information on the geography of introduced species. Current knowledge about the alien distribution of macrofungi is limited taxonomically and temporally, which severely hinders the study of human-mediated distribution changes for this taxonomic group. New information Here, we present a database on the global alien distribution of macrofungi species. Data on the distribution of alien macrofungi were searched in a large number of data sources, including scientific publications, grey literature and online databases. The database compiled includes 1966 records (i.e. species x region combinations) representing 2 phyla, 7 classes, 22 orders, 82 families, 207 genera, 648 species and 31 varieties, forms or subspecies. Dates of introduction records range from 1753 to 2018. Each record includes the location where the alien taxon was identified and, when available, the date of first observation, the host taxa or other important information. This database is a major step forward to the understanding of human-mediated changes in the distribution of macrofungal taxa....
Authors: Song Xiaoming, Nie Fulei, Chen Wei, Ma Xiao, Gong Ke, Yang Qihang, Wang Jinpeng, Li Nan, Sun Pengchuan, Pei Qiaoying, Yu Tong, Hu Jingjing, Li Xinyu, Wu Tong, Feng Shuyan, Li Xiu-Qing, Wang Xiyin
Publication date: April 1, 2020
Abstract: Coriander ( Coriandrum sativum L.), also known as cilantro, is a globally important vegetable and spice crop. Its genome and that of carrot are models for studying the evolution of the Apiaceae family. Here, we developed the Coriander Genomics Database (CGDB, http://cgdb.bio2db.com/ ) to collect, store, and integrate the genomic, transcriptomic, metabolic, functional annotation, and repeat sequence data of coriander and carrot to serve as a central online platform for Apiaceae and other related plants. Using these data sets in the CGDB, we intriguingly found that seven transcription factor (TF) families showed significantly greater numbers of members in the coriander genome than in the carrot genome. The highest ratio of the numbers of MADS TFs between coriander and carrot reached 3.15, followed by those for tubby protein (TUB) and heat shock factors. As a demonstration of CGDB applications, we identified 17 TUB family genes and conducted systematic comparative and evolutionary analyses. RNA-seq data deposited in the CGDB also suggest dose compensation effects of gene expression in coriander. CGDB allows bulk downloading, significance searches, genome browser analyses, and BLAST searches for comparisons between coriander and other plants regarding genomics, gene families, gene collinearity, gene expression, and the metabolome. A detailed user manual and contact information are also available to provide support to the scientific research community and address scientific questions. CGDB will be continuously updated, and new data will be integrated for comparative and functional genomic analysis in Apiaceae and other related plants....
Abstract: Understanding the disease pathogenesis of the novel coronavirus, denoted SARS-CoV-2, is critical to the development of anti-SARS-CoV-2 therapeutics. The global propagation of the viral disease, denoted COVID-19 ("coronavirus disease 2019"), has unified the scientific community in searching for possible inhibitory small molecules or polypeptides. Given the known interaction between the human ACE2 ("Angiotensin-converting enzyme 2") protein and the SARS-CoV virus (responsible for the coronavirus outbreak circa. 2003), considerable focus has been directed towards the putative interaction between the SARS-CoV-2 Spike protein and ACE2. However, a more holistic understanding of the SARS-CoV-2 vs. human inter-species interactome promises additional putative protein-protein interactions (PPI) that may be considered targets for the development of inhibitory therapeutics. To that end, we leverage two state-of-the-art, sequence-based PPI predictors (PIPE4 & SPRINT) capable of generating the comprehensive SARS-CoV-2 vs. human interactome, comprising approximately 285,000 pairwise predictions. Of these, we identify the high-scoring subset of human proteins predicted to interact with each of the 14 SARS-CoV-2 proteins by both methods, comprising 279 high-confidence putative interactions involving 225 human proteins. Notably, the Spike-ACE2 interaction was the highest ranked for both the PIPE4 and SPRINT predictors, corroborating existing evidence for this PPI. Furthermore, the PIPE-Sites algorithm was used to predict the putative subsequence that might mediate each interaction and thereby inform the design of inhibitory polypeptides intended to disrupt the corresponding host-pathogen interactions. We hereby publicly release the comprehensive set of PPI predictions and their corresponding PIPE-Sites landscapes in the following DataVerse repository: https://www.doi.org/10.5683/SP2/JZ77XA. All data and metadata are released under a CC-BY 4.0 licence. The information provided represents theoretical modeling only and caution should be exercised in its use. It is intended as a resource for the scientific community at large in furthering our understanding of SARS-CoV-2....
Abstract: In response to the COVID-19 pandemic, both voluntary changes in behavior and administrative restrictions on human interactions have occurred. These actions are intended to reduce the transmission rate of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). We use anonymized and/or de-identified mobile device locations to measure mobility, a statistic representing the distance a typical member of a given population moves in a day. Results indicate that a large reduction in mobility has taken place, both in the US and globally. In the United States, large mobility reductions have been detected associated with the onset of the COVID-19 threat and specific government directives. Mobility data at the US admin1 (state) and admin2 (county) level have been made freely available under a Creative Commons Attribution (CC BY 4.0) license via the GitHub repository https://github.com/descarteslabs/DL-COVID-19/...
Authors: Schreiber Jacob, Bilmes Jeffrey, Noble William Stafford
Publication date: March 30, 2020
Abstract: Recent efforts to describe the human epigenome have yielded thousands of epigenomic and transcriptomic datasets. However, due primarily to cost, the total number of such assays that can be performed is limited. Accordingly, we applied an imputation approach, Avocado, to a dataset of 3814 tracks of data derived from the ENCODE compendium, including measurements of chromatin accessibility, histone modification, transcription, and protein binding. Avocado shows significant improvements in imputing protein binding compared to the top models in the ENCODE-DREAM challenge. Additionally, we show that the Avocado model allows for efficient addition of new assays and biosamples to a pre-trained model....
Authors: Gil Caspi, Uri Shalit, Soren Lund Kristensen, Doron Aronson, Lilac Caspi, Oran Rossenberg, Avi Shina, Oren Caspi
Publication date: March 30, 2020
Abstract: Background: COVID-19 outbreak poses an unprecedented challenge for societies, healthcare organizations and economies. In the present analysis we coupled climate data with COVID-19 spread rates worldwide, and in a single country (USA). Methods: Data of confirmed COVID-19 cases was derived from the COVID-19 Global Cases by the CSSE at Johns Hopkins University up to March 19, 2020. We assessed disease spread by two measures: replication rate (RR), the slope of the logarithmic curve of confirmed cases, and the rate of spread (RoS), the slope of the linear regression of the logarithmic curve. Results: Based on predefined criteria, the mean COVID-19 RR was significantly lower in warm climate countries (0.12± 0.02) compared with cold countries (0.24± 0.01), (P<0.0001). Similarly, RoS was significantly lower in warm climate countries 0.12± 0.02 vs. 0.25± 0.01 than in cold climate countries (P<0.001). In all countries (independent of climate classification) both RR and RoS displayed a moderate negative correlation with temperature R= -0.69, 95% confidence interval [CI], -0.87 to -0.36; P<0.001 and R= -0.72, 95% confidence interval [CI], -0.87 to -0.36; P<0.001, respectively. We identified a similar moderate negative correlation with the dew point temperature. Additional climate variables did not display a significant correlation with neither RR nor RoS. Finally, in an ancillary analysis, COVID-19 intra-country model using an inter-state analysis of the USA did not identify yet correlation between climate parameters and RR or RoS as of March, 19, 2020. Conclusions: Our analysis suggests a plausible negative correlation between warmer climate and COVID-19 spread rate as defined by RR and RoS worldwide. This initial correlation should be interpreted cautiously and be further validated over time, the pandemic is at different stages in various countries as well as in regions within these countries. As such, some associations may be more affected by local transmission patterns rather than by climate. Importantly, we provide an online surveillance dashboard (https://covid19.net.technion.ac.il/) to further assess the association between climate parameters and outbreak dynamics worldwide as time goes by...
Abstract: Abstract Ferroptosis is a mode of regulated cell death that depends on iron. Cells die from the toxic accumulation of lipid reactive oxygen species. Ferroptosis is tightly linked to a variety of human diseases, such as cancers and degenerative diseases. The ferroptotic process is complicated and consists of a wide range of metabolites and biomolecules. Although great progress has been achieved, the mechanism of ferroptosis remains enigmatic. We have currently entered an era of extensive knowledge advancement, and thus, it is important to find ways to organize and utilize data efficiently. We have observed a high-quality knowledge base of ferroptosis research is lacking. In this study, we downloaded 784 ferroptosis articles from the PubMed database. Ferroptosis regulators and markers and associated diseases were extracted from these articles and annotated. In summary, 253 regulators (including 108 drivers, 69 suppressors, 35 inducers and 41 inhibitors), 111 markers and 95 ferroptosis-disease associations were found. We then developed FerrDb, the first manually curated database for regulators and markers of ferroptosis and ferroptosis-disease associations. The database has a user-friendly interface, and it will be updated every 6 months to offer long-term service. FerrDb is expected to help researchers acquire insights into ferroptosis.
Database URL:
http://www.zhounan.org/ferrdb...
Abstract: Abstract Circular RNAs (circRNAs) are unique transcript isoforms characterized by back splicing of exon ends to form a covalently closed loop or circular conformation. These transcript isoforms are now known to be expressed in a variety of organisms across the kingdoms of life. Recent studies have shown the role of circRNAs in a number of diseases and increasing evidence points to their potential application as biomarkers in these diseases. We have created a comprehensive manually curated database of circular RNAs associated with diseases. This database is available at URL http://clingen.igib.res.in/circad/ . The Database lists more than 1300 circRNAs associated with 150 diseases and mapping to 113 International Statistical Classification of Diseases (ICD) codes with evidence of association linked to published literature. The database is unique in many ways. Firstly, it provides ready-to-use primers to work with, in order to use circRNAs as biomarkers or to perform functional studies. It additionally lists the assay and PCR primer details including experimentally validated ones as a ready reference to researchers along with fold change and statistical significance. It also provides standard disease nomenclature as per the ICD codes. To the best of our knowledge, circad is the most comprehensive and updated database of disease associated circular RNAs.
Availability: http://clingen.igib.res.in/circad/...