In biomedical research, network analysis provides a conceptual framework for interpreting data from high-throughput experiments. We have subsequently used Mgrep to build the Open Biomedical Annotator service. Wang, J. K., Hom, J., Balasubramanian, S., Schuler, A., Shah, N. H., Goldstein, M. K., Baiocchi, M. T., Chen, J. H. Comparative safety and effectiveness of alendronate versus raloxifene in women with osteoporosis. We tested both types of systems in three clinical research tasks: phase IV safety profiling of a drug, learning adverse drug-drug interactions, and learning used-to-treat relationships between drugs and indications.We first benchmarked the accuracy of the NCBO Annotator and REVEAL in a manually annotated, publically available dataset from the 2008 i2b2 Obesity Challenge. an OBO ontology may be translated to OWL and back without loss of knowledge. Try the Course for Free. Sinai hospitals. View details for DOI 10.1186/2041-1480-4-S1-I1. We demonstrate how this mapping enables ontology driven integration and querying of tissue microarray data. We trained a highly accurate predictive model that detects novel off-label uses among 1,602 unique drugs and 1,472 unique indications. We have defined transformations for all constructs in an effort to foster a standard common mapping between OBO and OWL. HyBrow consists of a modeling framework with the ability to accommodate diverse biological information sources, an event-based ontology for representing biological processes at different levels of detail, a database to query information in the ontology and programs to perform hypothesis design and evaluation. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. This filtered and augmented UMLS Metathesaurus can potentially be used to improve efficiency and precision of UMLS-based information retrieval and NLP tasks. Feasibility of Prioritizing Drug-Drug-Event Associations Found in Electronic Health Records. Over the 15 years, the Bio-Ontologies SIG at ISMB has provided a forum for discussion of the latest and most innovative research in the bio-ontologies development, its applications to biomedicine and more generally the organisation, presentation and dissemination of knowledge in biomedicine and the life sciences. Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., Goldberg, L. J., Eilbeck, K., Ireland, A., Mungall, C. J., Leontis, N., Rocca-Serra, P., Ruttenberg, A., Sansone, S., Scheuermann, R. H., Shah, N., Whetzel, P. L., Lewis, S. Current progress in network research: toward reference networks for key model organisms. Predictive modeling of risk factors and complications of cataract surgery. Loss of β-cell mass is a cardinal feature of diabetes. Schuler, A., Callahan, A., Jung, K., Shah, N. H. What This Computer Needs Is a Physician Humanism and Artificial Intelligence. Users can easily incorporate the NCBO Web services into software applications to generate semantically aware applications and to facilitate structured data collection. Jonquet, C., Lependu, P., Falconer, S., Coulet, A., Noy, N. F., Musen, M. A., Shah, N. H. Mapping between the OBO and OWL ontology languages. The privacy rule in the Health Insurance Portability and Accountability Act (HIPAA) of 1996 may require revision to support this novel use of patient data. The National Center for Biomedical Ontology is now in its seventh year. Early, accurate prediction of delayed healing wounds can improve patient care by allowing clinicians to increase the aggressiveness of intervention in patients most at risk. View details for Web of Science ID 000360306600007. Our approach achieves a discrimination accuracy of 0.85 in terms of the area under the receiver operator curve (AUC) for the reference set of well-established ADEs and an AUC of 0.68 for the reference set of recently labeled ADEs. Using a clinical text-mining tool, we detected unplanned episodes documented in clinician notes (for non-SHC visits) or in coded encounter data for SHC-delivered care and the most frequent symptoms documented in emergency department (ED) notes.Combined reporting increased the identification of patients with one or more unplanned care visits by 32% (15% using coded data; 20% using all the data) among patients with 3 months of follow-up and by 21% (23% using coded data; 28% using all the data) among those with 1 year of follow-up. Project: Continuously profiling patients screened for … The accuracy vs. coverage trade-off in patient-facing diagnosis models. Estimate the hidden deployment cost of predictive models to improve patient care. StressDB provides small user groups with a locally installable web-based relational microarray database. The first one is coverage, or the ontologies that provide most terms covering the input text. After pre-filtering the associations by removing those found in public databases, we devised a ranking for associations based on the support from the remaining sources, and evaluated the results of this rank-based prioritization.We collected information for 5983 putative EHR-derived drug-drug-event associations involving 345 drugs and ten adverse events from four data sources and four prediction methods. The production server uses the Apache HTTP Server, Oracle Database and Perl application code. We have deployed the mapping and ontology driven querying tools at the TMAD site for general use.We have demonstrated that we can effectively map the diagnosis-related terms describing a sample in TMAD to the NCI-T. View Nigam Shah’s profile on LinkedIn, the world's largest professional community. Unfortunately, the very success of this approach has led to a proliferation of ontologies, which itself creates obstacles to integration. We compared 3 approaches to semantic similarity-metrics (which rely on expert opinion, ontologies only, and information content) with 4 metrics applied to SNOMED-CT. We found that there was poor agreement among those metrics based on information content with the ontology only metric. The use of tissue microarrays allows hundreds of human tissue cores to be simultaneously probed by antibodies to detect protein abundance (Immunohistochemistry; IHC), or by labeled nucleic acids (in situ hybridization; ISH) to detect transcript abundance. Miner, A. S., Haque, A., Fries, J. A., Fleming, S. L., Wilfley, D. E., Terence Wilson, G., Milstein, A., Jurafsky, D., Arnow, B. Our Java implementation of the mapping is part of the official Gene Ontology project source.Our transformation system provides a lossless roundtrip mapping for OBO ontologies, i.e. Ross, E., Jung, K., Dudley, J. T., Li, L., Leeper, N. J., Shah, N. H. Predicting Future Cardiovascular Events in Patients With Peripheral Artery Disease Using Electronic Health Record Data. Sitagliptin was the most effective second-line therapy, and as effective as metformin as a first line therapy. To date, there have not been comparisons of the different semantic-similarity approaches on a single ontology. Evolutionary Pressures on the Electronic Health Record: Caring for Complexity. HyQue features a knowledge model to accommodate diverse hypotheses structured as events and represented using Semantic Web languages (RDF/OWL). View details for DOI 10.1373/clinchem.2015.251827, View details for Web of Science ID 000375173400014. Verspoor, K., Oellrich, A., Collier, N., Groza, T., Rocca-Serra, P., Soldatova, L., Dumontier, M., Shah, N. Synergistic drug combinations from electronic health records and gene expression. The first set permits bad data points to be flagged with respect to a number of parameters and performs normalization in three different ways. Most OBO constructs have easy and obvious equivalence to a construct in OWL. Jaewon Yang, Julian McAuley, Jure Leskovec, Paea LePendu, Nigam Shah. Ghebremariam, Y. T., Lee, J. C., LePendu, P., Erlanson, D. A., Slaviero, A., Shah, N. H., Leiper, J. M., Cooke, J. P. Mining clinical text for signals of adverse drug-drug interactions. Several studies have demonstrated the ability to detect adverse events potentially related to multiple drug exposure via data mining. Scientific Reports, Nature Publishing Group, 2018, 8 (1), 10.1038/s41598-018-33980-0 . The NCBO Web services ( enable this functionality and provide a uniform mechanism to access ontologies from a variety of knowledge representation formats, such as Web Ontology Language (OWL) and Open Biological and Biomedical Ontologies (OBO) format. For example, by annotating known protein mutations with disease terms from the ontologies in BioPortal, Mort et al. New opportunities have emerged to harness data sources that have not been used within the traditional framework. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. The myriad of ontologies being created enables researchers not only to solve some of the problems in handling the data explosion but also introduces new challenges. The result of such integration is a repository of normalized biomedical relationships, named PHARE-KB, which can be queried using Semantic Web technologies such as SPARQL and can be visualized in the form of a biological network.The PHARE ontology serves as a common semantic framework to integrate more than 40,000 relationships pertinent to pharmacogenomics. Using a de-identified dataset of geo-tagged mobile Internet search logs, we mined text and location patterns that are predictors of healthcare resource utilization and built statistical models that predict the probability of a user's future visit to a medical facility. Mahalingam, R., Gomez-Buitrago, A., Eckardt, N., Shah, N., Guevara-Garcia, A., Day, P., Raina, R., Fedoroff, N. V. StressDB: A locally installable web-based relational microarray database designed for small user communities. However, there is no prospective clinical study evaluating whether the use of PPIs directly causes CV harm. Some searches result in page views of only short duration, while others consistently result in longer-than-average page views. A curated and standardized adverse drug event resource to accelerate drug safety research. Of source terminologies in the UMLS, the Consumer Health Vocabulary and Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) had the best coverage in Mayo clinical notes at 106426 and 94788 unique terms, respectively. LIMIT represents a fast and inexpensive solution for calculating reference intervals, and shows that it is possible to use laboratory results and coded diagnoses to learn laboratory test reference intervals from clinical data warehouses. Disease status is often summarized by repeated recordings of one or more physiological measures. The key functionality of this system is to enable users to locate biomedical data resources related to particular ontology concepts. Hypothesis validity is evaluated against experimental and literature-sourced evidence through a combination of SPARQL queries and evaluation rules. Callahan, Alison, Igor Pernek, Gregor Stiglic, Jure Leskovec, Howard R. Strasberg and Nigam H. Shah. But before we can reliably use a pathway knowledge-base as a data source, we need to proofread it to ensure that it can fully support computer-aided information integration and inference.We design a series of logical tests to detect potential problems we might encounter using a particular knowledge-base, the Reactome database, with a particular computer-aided hypothesis evaluation tool, HyBrow. Tian, Y., Kim, Y., Yang, J., Huser, V., Jin, P., Lambert, C., Park, H., Park, R., Rijnbeek, P., Van Zandt, M., Vashisht, R., Wu, Y., You, S., Duke, J., Hripcsak, G., Madigan, D., Reich, C., Shah, N., Ryan, P., Schuemie, M., Suchard, M. The Impact of Acute Organ Dysfunction on Long-Term Survival in Sepsis. Therefore, analyzing patterns of off-label drug usage in the clinical setting is an important step toward reducing the incidence of adverse events and for improving patient safety. Chen, J. H., Humphreys, K., Shah, N. H., Lembke, A. Banda, J. M., Evans, L., Vanguri, R. S., Tatonetti, N. P., Ryan, P. B., Shah, N. H. Rapid identification of slow healing wounds. View details for DOI 10.1136/amiajnl-2011-000744, View details for Web of Science ID 000314151400025, View details for PubMedCentralID PMC3392861. The Effectiveness of Multitask Learning for Phenotyping with Electronic Health Records Data. View details for DOI 10.1038/sdata.2014.43, View details for PubMedCentralID PMC4306188, View details for DOI 10.1136/amiajnl-2014-002733, View details for Web of Science ID 000343776700019. Profiling risk factors for chronic uveitis in juvenile idiopathic arthritis: a new model for EHR-based research. With increasing adoption of electronic health records (EHRs), there is an opportunity to use the free-text portion of EHRs for pharmacovigilance. This work provides new mechanistic insights into cAMP-dependent growth regulation of β-cells and highlights the potential of commonly prescribed medications to influence β-cell growth. There is heterogeneity in the manifestation of diseases, therefore it is essential to understand the patterns of progression of a disease in a given population for disease management as well as for clinical research. However, the lack of a common classification system hinders the comprehensive and systematic profiling of research activities. We demonstrate the use of the resulting networks for clinical research informatics in two ways-cohort construction and outcomes analysis-by examining the safety of cilostazol in peripheral artery disease patients as a use case. Our evaluations demonstrate that Mgrep has a clear edge over MetaMap for large-scale service oriented applications. Androgen Deprivation Therapy and Subsequent Dementia-Reply. The funding profiles of disease topics readily cluster themselves in agreement with the ontology hierarchy and closely mirror the funding agency priorities. Availability and Supplementary information:, View details for DOI 10.1093/bioinformatics/btg253, View details for Web of Science ID 000185701100017, View details for Web of Science ID 000188997700143. recently identified a class of diseases--blood coagulation disorders--that were associated with a 14-fold depletion in substitutions at O-linked glycosylation sites. An unsupervised learning method to identify reference intervals from a clinical database. The objective of this study was to test the feasibility and accuracy of identifying patient centered outcomes within the EHR.Data from patients with localized prostate cancer undergoing prostatectomy were used to develop and test algorithms to accurately identify patient-centered outcomes in post-operative EHRs - we used urinary incontinence as the use case. A., Shah, N. H. Assessing the accuracy of automatic speech recognition for psychotherapy. Ransohoff, J. D., Nikfarjam, A., Jones, E., Loew, B., Kwong, B. Y., Sarin, K. Y., Shah, N. H. Inpatient Clinical Order Patterns Machine-Learned From Teaching Versus Attending-Only Medical Services. Nikfarjam, A., Ransohoff, J. D., Callahan, A., Jones, E., Loew, B., Kwong, B. Y., Sarin, K. Y., Shah, N. H. It is time to learn from patients like mine. PDE3,4 and 10 inhibitors, including dipyridamole, were found to promote β-cell replication in an adenosine receptor-dependent manner. Second, we analyze a small (N=1746), local dataset documenting the clinical progression of autism spectrum disorder patients using granular features from the electronic health record, including text from physician notes. More than two-thirds of the genes in the stress cDNA collection have not been identified in previous studies as stress/defense response genes. One approach to integration is through the annotation of multiple bodies of data using common controlled vocabularies or 'ontologies'. To characterise empirical instances of Unified Medical Language System (UMLS) Metathesaurus term strings in a large clinical corpus, and to illustrate what types of term characteristics are generalisable across data sources.Based on the occurrences of UMLS terms in a 51 million document corpus of Mayo Clinic clinical notes, this study computes statistics about the terms' string attributes, source terminologies, semantic types and syntactic categories. Some have used structured databases of patient medical records and health insurance claims recently-going beyond the current paradigm of using spontaneous reporting systems like AERS-to detect drug-safety signals. An important challenge in translational bioinformatics is to understand how genetic variation gives rise to molecular changes at the protein level that can precipitate both monogenic and complex disease. See the complete profile on LinkedIn and discover Nigam’s connections and jobs at similar companies. Our findings also quantify the agreement (or lack thereof) among complementary sources of evidence for drug-drug-event associations and highlight the challenges of developing a robust approach for prioritizing signals of these associations. After a 2-week washout period, participants were crossed over to receive the alternate treatment for the ensuing 4 weeks. PRESEN-TATIONS / POSTERS Daisy Ding*, Tony Duan*, Scott Fleming*, Saurabh Gombar, Kenneth Jung, Nigam Shah (September, 2019). When analyzing such co-occurrences of drugs and diseases, one major challenge is to differentiate whether the disease in a drug-disease pair represents an indication or an adverse event. We theorized that those potential associations for which there is evidence from multiple complementary sources are more likely to be true, and explored this idea using a published database of drug-drug-adverse event associations derived from electronic health records (EHRs).We prioritized drug-drug-event associations derived from EHRs using four sources of information: (1) public databases, (2) sources of spontaneous reports, (3) literature, and (4) non-EHR drug-drug interaction (DDI) prediction methods. However, this annotation process cannot be easily automated and often requires expert curators. We're talking about organizations rather than professionals now. The study focused on ADRs related to three high-profile serious adverse reactions. This process, referred to as enrichment analysis, profiles a gene set, and is widely used to make sense of the results of high-throughput experiments. Proton pump inhibitors (PPIs) are gastric acid-suppressing agents widely prescribed for the treatment of gastroesophageal reflux disease. The Web interface also facilitates community-based participation in the evaluation and evolution of ontology content by providing features to add notes to ontology terms, mappings between terms and ontology reviews based on criteria such as usability, domain coverage, quality of content, and documentation and support. Professor. We demonstrate that it is possible to make this distinction by combining the frequency distribution of the drug, the disease, and the drug-disease pair as well as the temporal ordering of the drugs and diseases in each pair across more than one million patients. We envision this method will have broad applications for examining difficult to test clinical hypotheses and to aid in post-marketing drug safety surveillance. However, most efforts do not use the free-text from clinical notes in monitoring for drug-safety signals. Banerjee, I., Sofela, M., Yang, J., Chen, J. H., Shah, N. H., Ball, R., Mushlin, A. I., Desai, M., Bledsoe, J., Amrhein, T., Rubin, D. L., Zamanian, R., Lungren, M. P. Increased monocyte count as a cellular biomarker for poor outcomes in fibrotic diseases: a retrospective, multicentre cohort study. Data mining methodologies designed to identify signals of novel ADRs are of deep importance for drug safety surveillance. View details for Web of Science ID 000251086500025. The Open Biomedical Ontologies (OBO) consortium is pursuing a strategy to overcome this problem. The system uses textual metadata or a set of keywords describing a domain of interest and suggests appropriate ontologies for annotating or representing the data. Precision screening for familial hypercholesterolaemia: a machine learning study applied to electronic health encounter data. Using text to build semantic networks for pharmacogenomics. Nigam H. Shah is a research scientist at the Stanford Center for Biomedical Informatics group and member of the National Center for Biomedical Ontology. These ontologies have been mainly expressed in either the Open Biomedical Ontology (OBO) format or the Web Ontology Language (OWL). Predicting the need for a reduced drug dose, at first prescription. Although there are several programs that identify and analyze functional categories for human, mouse and yeast genes, none of them accept Arabidopsis thaliana data. Such a comparison can offer insight on the validity of different approaches. More information about Neil Shah. Caswell-Jin, J., Callahan, A., Purington, N., Han, S. S., Itakura, H., Sledge, G. W., Shah, N., Kurian, A. W. Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network. Chen, R., Ryan, P., Natarajan, K., Falconer, T., Crew, K. D., Reich, C. G., Vashisht, R., Randhawa, G., Shah, N. H., Hripcsak, G. A predictive tool for identification of SARS-CoV-2 PCR-negative emergency department patients using routine test results. Knowledge of clinical associations will improve risk stratification. The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data. Advanced statistical methods used to analyze high-throughput data (e.g. Data from Electronic Medical Records (EMRs) has been used to profile first line therapy choices, but this work did not elucidate the factors underlying deviations from current treatment guidelines and the relative efficacy of different treatment options. Syntactically, over 90% of matched terms were in noun phrases. Giving clinicians such a tool would support patient care decisions in the absence of gold-standard evidence and would help prioritize clinical questions for which EHR-enabled randomization should be carried out. We propose a complementary format to represent GO annotation files as knowledge bases using the W3C recommended Web Ontology Language (OWL). Harnessing next-generation informatics for personalizing medicine: a report from AMIA's 2014 Health Policy Invitational Meeting. The most promising aspect of this approach consists in the discovery of positive results not present our Obesity NLP reference set.Together with a Web graphical user interface, our FCA and SQE cooperation end up being an efficient approach for refining health outcome of interest using plain terms. Peripheral arterial disease (PAD) is a growing problem with few available therapies. Stanford University pathologists, researchers and their collaborators worldwide use TMAD for designing, viewing, scoring and analyzing their tissue microarrays. Tamang, S. R., Hernandez-Boussard, T., Ross, E. G., Gaskin, G., Patel, M. I., Shah, N. H. Funding and Publication of Research on Gun Violence and Other Leading Causes of Death. We also examine sets of drug pairs known to be associated with hyperglycemia and those not associated with hyperglycemia. Pannu, J., Poole, S., Shah, N., Shah, N. H. Enhanced Quality Measurement Event Detection: An Application to Physician Reporting. We show that the network-based approaches can be used for constructing patient cohorts as well as for analyzing differences in outcomes by comparing with standard methods, and discuss the advantages offered by network-based approaches. Recent retrospective cohorts and large database studies have raised concern that the use of PPIs is associated with increased cardiovascular (CV) risk. The actor shared a heartwarming post on his Facebook account informing everyone that he lost his son on November 8. View details for PubMedCentralID PMC3861917. The PHARE ontology forms the foundation of a knowledge base named PHARE-KB. The duration of ADT use was also tested for association with Alzheimer's disease risk.There were 16,888 individuals with prostate cancer meeting all inclusion and exclusion criteria, with 2,397 (14.2%) receiving ADT during a median follow-up period of 2.7 years (interquartile range, 1.0-5.4 years). Building the graph of medicine from millions of clinical narratives. The model achieved an area under the curve of 0.842 (95% confidence interval 0.834-0.847) for the delayed healing outcome and a Brier reliability score of 0.00018. These statements comprise a large dataset of biological knowledge that is used widely in biomedical research. In particular, the meeting focused on discussing informatics challenges related to personalizing care through the integration of genomic or other high-volume biomolecular data with data from clinical systems to make health care more efficient and effective. In addition, the efficacies of first and second line treatments were evaluated using Cox proportional hazard models for control of Hemoglobin A1c. To enable such observational studies from EHR in real time, particularly in emergencies, rapid confounder control methods that can handle numerous variables and adjust for biases are imperative. Coulet, A., Garten, Y., Dumontier, M., Altman, R. B., Musen, M. A., Shah, N. H. NCBO Resource Index: Ontology-Based Search and Mining of Biomedical Resources. Our emphasis is on highlighting the key events in the field and pointing at promising research areas for the future. For the cross-institutional analysis, using five example filters on i2b2/VA data reduces the actual lexicon to 19.13% of the size of the UMLS and only sees a 2% reduction in matched terms.The corpus statistics presented here are instructive for building lexicons from the UMLS. Chronic uveitis is a common and serious comorbid condition of juvenile idiopathic arthritis, with insidious presentation and potential to cause blindness. View details for DOI 10.1007/s11103-005-2860-4, View details for Web of Science ID 000231220400007, Experimental design, hypothesis-testing and model-building in the current data-rich environment require the biologists' to collect, evaluate and integrate large amounts of information of many disparate kinds. Hripcsak, G., Duke, J. D., Shah, N. H., Reich, C. G., Huser, V., Schuemie, M. J., Suchard, M. A., Park, R. W., Wong, I. C., Rijnbeek, P. R., van der Lei, J., Pratt, N., Norén, G. N., Li, Y. C., Stang, P. E., Madigan, D., Ryan, P. B. Analyzing Information Seeking and Drug-Safety Alert Response by Health Care Professionals as New Methods for Surveillance.