Biomedical Informatics Graduate Rotation Projects

This page is updated annually. Some projects may already be taken, and new projects may be available. The projects below give an indication of the types of projects available in each lab, but please browse faculty web pages and contact professors directly to discuss current opportunities.

View Rotation Projects by Faculty: BISB or BMI

Labs with BMI Rotation Projects

Ferhat Ay | Pediatrics

  • ferhatay@lji.org
  • ferhatay@lji.org
  • ferhatay@lji.org

Lukas Chavez | School of Medicine

  • lukaschavez@ucsd.edu

Robert El-Kareh | Biomedical Informatics

  • relkareh@ucsd.edu

Kelly Frazer | Pediatrics

  • kafrazer@health.ucsd.edu

Olivier Harismendy | Biomedical Informatics

  • oharismendy@ucsd.edu
  • oharismendy@ucsd.edu
  • oharismendy@ucsd.edu

Lilia Iakoucheva | Psychiatry

  • lilyak@ucsd.edu
  • lilyak@ucsd.edu
  • lilyak@ucsd.edu
  • lilyak@ucsd.edu

Rob Knight | Pediatrics

  • rknight@ucsd.edu
  • rknight@ucsd.edu
  • rknight@ucsd.edu

Jejo Koola | School of Medicine

  • jkoola@ucsd.edu
  • jkoola@ucsd.edu
  • jkoola@ucsd.edu

Tsung-Ting Kuo | Biomedical Informatics

  • tskuo@ucsd.edu

Amit Majithia | School of Medicine

  • amajithia@ucsd.edu
  • amajithia@ucsd.edu
  • amajithia@ucsd.edu

Lucila Ohno-Machado | Biomedical Informatics

  • machado@ucsd.edu

Yingxiao (Peter) Wang | Bioengineering

  • yiw015@eng.ucsd.edu
  • yiw015@eng.ucsd.edu

| Pediatrics

ferhatay@lji.org | Lab

We are interested in the analysis and modeling of the three-dimensional chromatin structure from high-throughput sequencing experiments. We develop methods that are based in statistics, machine learning, optimization and graph theory to understand how changes in the 3D genome affect cellular outcome such as development, differentiation and gene expression. We have ongoing interests in the systems level analysis and reconstruction of regulatory networks, inference of enhancer-promoter contacts, predictive models of gene expression and integration of three-dimensional chromatin structure with one-dimensional epigenetic measurements in the context of cancer, malaria, asthma and several autoimmune diseases.

  • Integrative analysis of multi cell-type gene expression and epigenomic data in tumor immune response

    Last Updated:

    This project will focus on developing regulatory network inference methods for the joint analysis of gene expression and histone modification data from several different types of tumor infiltrating lymphocytes, which are gathered from a cohort of patients with solid tumors.

  • Predictive and comparative modeling of epigenetic gene regulation in different human immune cell types

    Last Updated:

    The goal of this project is to model the natural variation in gene expression across many immune cell types using an already established database at LJI (http://dice-database.org) and to identify cell type-specific epigenetic regulators of important immune genes.

  • Statistical methods for inferring functional DNA-DNA contacts from Hi-C and HiChIP/PLAC-seq data

    Last Updated:

    This project focuses on developing computational tools for better analysis of the wealth of data from chromosome conformation capture assays with the ultimate goal of inferring functional chromatin contacts such as those between enhancers and promoters.

| School of Medicine

lukaschavez@ucsd.edu | Profile | Lab

The main objective of the Chavez laboratory is the molecular characterization of malignant childhood cancers in order to identify drug targets and improve treatment options. Our focus is mainly on pediatric brain tumors such as medulloblastoma, glioblastoma, and ependymoma. Recently, we have demonstrated how to leverage epigenetic information such as DNA methylation and enhancer profiling in pediatric brain tumors and normal human tissues to identify clinically relevant tumor subgroups, oncogenic enhancers, transcription factors, and pathways amenable to pharmacologic targeting. To reveal regulatory circuitries disturbed in childhood brain tumors, we generate and integrate public high-dimensional data from primary tumors and patient-derived cell lines. We are specifically interested in the analysis of somatic and germline DNA mutations, chromatin and DNA modifications, transcription factor binding, and gene expression.

  • The 3D Tumor Genome

    Last Updated:

    To identify molecular mechanisms that contribute to tumor development and maintenance, we develop hypotheses driven computational tools for the integrative analysis of different layers of genetic and epigenetic information. As we recognized that our epigenetic mapping studies can identify effective drug targets, we are now profiling 3D tumor genomes to uncover molecular mechanisms that may cause disturbed enhancer-gene interactions leading to deregulation of gene expression and biochemical pathways.

| Biomedical Informatics

relkareh@ucsd.edu | Profile
  • Development of a Research Electronic Health Record for Clinical Decision Support Studies

    Creation of effective clinical decision support tools has the potential to significantly improve the quality of care delivered within our healthcare system. However, developing and testing prototypes of these tools requires access to realistic electronic health record (EHR) environments. This process has often involved prohibitively long turnaround times due to time and resource constraints of healthcare information systems groups. For research and educational purposes, these barriers could be avoided by creating an investigator-controlled research EHR and populating it with realistic clinical data. Such a system could enable researchers and students to develop a wide range of novel and innovative clinical decision support tools much more rapidly.

    Aims:

    1. Install a sophisticated, open-source EHR (OpenMRS)
    2. Populate this EHR with deidentified data from a rich clinical database (MIMIC II)
    3. Develop one or more prototype clinical decision support tools within this environment

| Pediatrics

kafrazer@health.ucsd.edu | Profile | Lab

Welcome to the Frazer Lab! We are using two complementary approaches to achieve our goal of identifying and characterizing functional human genetic variants. Our first approach utilizes iPSCORE, a resource that was generated to enable both familial and association-based genetic studies of molecular and physiological phenotypes in induced pluripotent stem cells (iPSCs) and derived cell types. Our second approach involves conducting association studies in well-characterized cohorts with the goal of identifying variants that play roles in human disease and to assess their contributions to disease pathogenesis, progression, and prognosis.

  • Investigate fetal-specific cardiac regulatory variants and their overlap with cardiac GWAS lead variants

    Last Updated:

    We have derived iPSC-CVPCs from 180 individuals and showed that their transcriptomes are more similar to fetal heart than to adult cardiac tissues. Our goal is to leverage these data in combination with WGS to perform eQTL analyses. We plan to assess whether fetal-specific eQTLs are associated with complex adult cardiac traits, by colocalizing eQTLs with summary statistics from GWAS (cardiac traits.) Our preliminary analyses show that eQTLs in iPSC-CVPCs identifies cardiac disease GWAS variants that are active in the fetal but not adult heart, indicating that they play a role in development. Our findings provide genetic evidence supporting the fetal origins of the cardiovascular disease hypothesis and highlight the importance of investigating genetic associations across stages of development (i.e. fetal and adult tissues) to fully understand the genetic underpinnings of complex traits and disease. We are looking for rotation students to conduct QTL analyses using large ATAC-seq and ChiP-seq for H3K27ac datasets generated from the iPSC-CVPCs.

| Biomedical Informatics

oharismendy@ucsd.edu | Profile | Lab

The Oncogenomics laboratory is located in the Moores Cancer Center. Its research program is focused on the identification of genetic and epigenetic markers for cancer prevention and progression as well as drug response. The laboratory is a humid laboratory, combining both wet-lab techniques and bioinformatics analysis to study cancer samples from patients and animal models of cancer. The laboratory is also an important partner for multiple principal investigators at the Moores Cancer Center, collaborating on the design, analysis and interpretation of their genomic experiments.

  • Development of Genomics Virtual Machines in HIPAA compliant cloud

    Genetic information is considered protected health information (PHI) and as a consequence the highest security standards need to be applied for its storage, analysis and sharing. The oncogenomics laboratory is using state of the art iDASH compute cloud for its main computation. As a consequence, we participate in the development of optimal workflows and virtual machines for the analysis of patient-derived genomic datasets such as whole exomes, whole genomes, RNA-seq or genotyping arrays. 

    In this project we will develop robust provisioning methods to establish virtual machines capable of running popular human genomic analysis workflows. We will benchmark these machines and workflows and convert some of them into standard recipes for production-grade, reproducible genomic analysis.

  • Genetic and epigenetic of cisplatin resistance

    Last Updated:

    Cisplatin (cDDP) is the most commonly used chemotherapeutic drug, but most cancer eventually become resistant, leading to tumor recurrence. Several biological processes may modulate cDDP sensitivity: Drug import, export, detoxification, DNA repair, apoptosis. Drug resistance is transmitted to daughter cells, and one can build up resistant cell lines in vitro using sequential treatments. We are interested in identifying the genetic mutations that mediate this resistance. For this, we have derived resistant cell-lines from single clones of a cDDP sensitive ovarian cancer cell line. Using exome sequencing as well as target sequencing, we propose to determine mutations in genes and pathways that drive drug resistance. We will then expand the findings to the TCGA samples, using time to recurrence as an indicator of drug sensitivity.

  • The role of inherited variation in cancer somatic landscape

    Last Updated:

    The role of germline or inherited variation in cancer has been studied in selected families and led to the identification of genetic variants that are dominant and responsible for cancer syndromes. Similarly, rare recessive variants with lower penetrance are responsible for the increased risk in breast and ovarian cancer (BRCA1/2). More common variants in the population have also been identified through GWAS, and have revealed multiple SNPs associated with a modest increase in cancer risk. Despite these advances, multiple variants of intermediate allelic frequency in the population, or carried by patients with undocumented family history still remain variants of unknown significance (VUS) and can still play a role in tumor development. In addition, the contribution of variants located outside of the coding region has been underexplored and can now be reexamined in the light of recent maps of the regulatory landscape. The long-term goal of this research is to utilize germline genetics variation in cancer prevention and care to better stage patients or predict their response to treatment.

    We propose to identify the germline variants in the UCSD Cancer center patients (targeted gene panel) as well as in the public TCGA/ICGC datasets (whole genomes). We will then test these variants, alone or in combination to identify the ones that impact cancer onset, the tumor somatic landscape or tissue-specific regulatory network. The project will involve the processing of high throughput sequencing data, population genetics, and statistical analysis, in a HIPAA compliant cloud-computing environment.

| Psychiatry

lilyak@ucsd.edu | Profile | Lab

The lab has a variety of bioinformatics projects aimed at improving understanding of the functional impact of autism risk mutations derived from exome and whole genome sequencing of the patients. We created mouse models carrying some of these mutations using CRISPR/Cas9, and also produced patient-derived cerebral organoids with autism risk mutations. We performed bulk RNA-seq from various brain regions or time periods in these models. Gene-level analyses of RNA-seq data has been completed (manuscripts in preparation). We are now pursuing isoform-level analyses of these data to better understand functional impact of autism risk mutations on splicing isoform transcriptome.

  • Isoform transcriptome of Cul3-HET mouse model

    Last Updated:

    The project deals with constructing the isoform-level co-expression and protein interaction networks for predicting functional impact of mutations in high risk autism gene Cul3. We have collected RNA-seq and TMT-proteomics data from various brain regions of Cul3+/- transgenic mouse. We are aiming at integrating isoform-level RNA-seq data with quantitative proteomic (peptide-level) data from the same samples to understand the impact of Cul3 mutation.

  • Isoform transcriptome of patient-derived cerebral organoids from 16p11.2 CNV carriers with autism

    Last Updated:

    Copy number variants (CNVs) represent significant risk factors for Autism Spectrum Disorders (ASD). One of the most frequent CNVs involved in ASD is a deletion or duplication of the 16p11.2 CNV locus, spanning 29 protein-coding genes. Despite the progress in linking 16p11.2 genetic changes with the phenotypic (macrocephaly and microcephaly) abnormalities in the patients and model organisms, the specific molecular pathways impacted by this CNV remain unknown. We generated bulk RNA-seq and TMT proteomic data from patient-derived cerebral organoids (3 deletion, 3 duplication and 3 control patients). The goal of the project is to analyze isoform-level RNA-seq data, as well as proteomics data to investigate functional impact of 16p11.2 CNV.

| Pediatrics

rknight@ucsd.edu | Lab

The Knight lab has broad interests in the human microbiome, the collection of trillions of microbes that inhabits our bodies, especially in developing techniques to read out these complex microbial communities and use the resulting data to understand human health, links between humans and the environment, and to prevent and cure disease. We offer a fast-paced environment with many collaborative opportunities on different projects.

  • Machine Learning for the Microbiome

    Last Updated:

    We have amassed a database of microbial DNA sequences from hundreds of thousands of biological specimens. Understanding how these changes relate to disease requires a range of machine learning and multivariate statistical approaches. There are many opportunities ranging from entry-level (benchmarking classifier performance on specific sample sets) to extremely challenging (using deep learning to infer the structure of global sample set relationships).

  • Multi-omics integration

    Last Updated:

    An increasing need is to integrate data from different "omics" level, e.g. genomes, metagenomes, metatranscriptomes, metaproteomes, metabolomes, immunological profiling, etc., into a single coherent picture separating healthy and disease states. Improved methods for performing this task, either directly or via intermediate representations such as mapping to metabolic and regulatory pathways, is essential for improving understanding. Projects in this category range from simple (testing where existing techniques like correlation networks or Procrustes analysis do/don't connect two specific data layers) to challenging (use transfer learning to integrate heterogeneous data layers and improve the underlying network annotation). An especially exciting emerging research direction here is XAI (explainable artificial intelligence), which can provide for clinical applications a better justification for a specific classification or suggestion.

  • Optimizing microbiome algorithms

    Last Updated:

    Many algorithms used in microbiome studies, especially in metagenomic assembly, are extremely computationally expensive. Opportunities exist for either exploiting new hardware architectures to accelerate existing algorithms, or for developing new approximate algorithms, to tackle problems in the workflow including inferring taxonomy and function from DNA sequence data, genome and metagenome assembly and annotation, computing community distance metrics from sparse compositional data, and high-level analyses of hundreds of thousands of microbiomes. Again these projects range from entry level (compare results of two multiple sequence alignment techniques for subsequent community analysis) to advanced (use non-von Neumann architectures to perform pattern classification in real time at the whole community level for disease detection).

| School of Medicine

jkoola@ucsd.edu | Profile

Dr. Koola is a physician scientist specializing in Biomedical Informatics and Hospital Medicine. He specializes in the area of big data machine learning for predictive analytics. In particular, he is interested in using electronic health records to improve care delivery--particularly for patients with advanced liver disease. Using risk prediction models in a healthcare context requires understanding of: (i) the healthcare system of intended use; (ii) risk model building; (iii) risk model assessment; and (iv) risk model re-calibration. Additionally, Dr. Koola is interested in visual analytics, data modeling, and health services research.

  • Designing the "Green Button" informatics consult service using big data analytics for personalized medicine

    Last Updated:

    In 2012 the Institute of Medicine released a desiderata for a learning healthcare system, where evidence informs practice and practice informs evidence. Though the randomized clinical trial (RCT) serves as the gold standard for informing clinical decisions, flaws exist in terms of achieving recruitment, overly stringent inclusion/exclusion criteria, and lack of patient-centered decision making. Observational cohort studies have grown as an important complement to RCTs allowing comparative effectiveness research and patient-centered trials. The surge of Electronic Health Records (EHR) and its resulting zettabyte of data5 allows us to realize this vision for the first time. Despite the growth of observational cohort studies, challenges still remain bringing the knowledge from the bench-to-the-bedside; moreover, model performance degrades when used in a cohort outside of its development.

    To ameliorate these difficulties, we propose to launch and study a novel “informatics consult” service. The service would allow clinicians, when no clear evidence based guidelines exist regarding care decisions, to query the UCSD clinical data warehouse by identifying patients similar to the index case. First proposed in the seminal “Green Button” paper by Longhurst et al., such a system would leverage our ability to truly deliver personalized, patient-centered care. Small-scale limited efforts have been put into practice to answer questions regarding treatment of melanoma8 and systemic lupus erythematosus complications. We note, however, the opportunity for a much larger service with broad impact starting with insights borne of data from UCSD, and potentially mining insights from the entire state-wide UC Health data warehouse.

    We note several novel challenges to this proposed system: (i) Performing semi-automated phenotyping so that we can identify clinical outcomes of interest10. (ii) Identifying patients that are similar to the index patient (often called clustering). (iii) Incorporating automated, computable search regarding guideline recommended care. (iv) Performing visual analytics to understand similarity of cohorts. (v) Communication of probability and statistical information to healthcare professionals so they can effectively manage uncertainty.

    Student responsibilities:

    1. Participate in project meetings
    2. Help design one of several possible algorithms/interfaces:
      a. patient clustering algorithm using unsupervised learning
      b. visual analytic interface for describing similar cohort of patients
      c. visual analytic interface to help communicate statistical risk
  • Integrating patient reported outcomes into the electronic health record to improve cardiovascular care.

    Last Updated:

    Unhealthy dietary choices—a lack of nutritious foods and an excess of unhealthy food—was shown as the major contributor in the 400,000 U.S. deaths in 2015 from cardiovascular diseases (CVD). Eating more nuts, vegetables, and whole grains, and less salt and trans fats, could save tens of thousands of lives in the U.S. each year. Obesity is one critical outcome of poor diet, which also contributes to heightened CVD risk. Thousands of smartphone apps are available to download for weight loss, but these apps primarily focus on caloric intake, rather than the overall quality of diet and lifestyle critical for CVD prevention.

    Mobile Health (mHealth) applications also have not been systematically tested for their effectiveness and are criticized for not having an evidence-based foundation. In this study, we adapt the design of mHeart to communicate automatically with the UCSD Electronic Health Record to help healthcare providers have access to psychosocial aspects of patient's care outside of the direct hospital system. In particular, the provider will be able to view logs of patient activity, dietary choices, and other lifestyle choices. The provider will also be able to send feedback to the patient to alter behavior.

    Student opportunities:

    1. Help modify smartphone app to make use of healthcare connection protocols like Apple HealthKit and Google Fit
    2. Understand interfaces that communicate with electronic health records (like FHIR)
    3. Help design point-to-point interface between smartphone app and electronic health record data, which is presented to provider
    4. Participate in meetings designing pilot study to test app performance
  • Systematic review and meta-analysis of hospital readmission for patients with cirrhosis.

    Last Updated:

    Patients with cirrhosis, a late stage of chronic liver disease, are at increased risk of hospitalization and hospital readmission. Although several studies have looked at models for predicting readmission for patients with cirrhosis, they are limited by small sample sizes, limited candidate predictor variables, and limited evaluation of discrimination and calibration. A systematic review and meta-analysis of available evidence can help shed new light on the problem, and help identify modifiable risk factors.

    Student responsibilities:

    1. Understand the basics of a systematic review
    2. Perform literature review
    3. Abstract necessary information in case report forms and help perform meta-analysis
    4. Help write manuscript

| Biomedical Informatics

tskuo@ucsd.edu | Profile

Dr. Tsung-Ting Kuo is an Assistant Professor of Medicine in University of California San Diego (UCSD) Health Department of Biomedical Informatics (DBMI). He is mainly conducting blockchain-based biomedical, healthcare and genomic studies. His research focuses on blockchain technologies, machine learning, and natural language processing.

  • Developing privacy-preserving predictive modeling algorithms on blockchain networks

    Last Updated:

    Predictive modeling can advance research and facilitate quality improvement initiatives and substantiate research results, especially when data from multiple healthcare systems can be included. However, current, state-of-the-art privacy-preserving predictive modeling frameworks are still centralized, in other words, the models from distributed sites are integrated in a central server to build a global model. This centralization carries several risks, e.g., single-point-of-failure at the central server. To improve the security and robustness of predictive modeling frameworks, we will develop and implement novel and advanced algorithms on decentralized blockchain networks (a distributed ledger/database technology adopted by the Bitcoin cryptocurrency) to build better models. The outcome will be algorithms that improve the predictive power of data from multiple healthcare systems through a distributed system.

| School of Medicine

amajithia@ucsd.edu | Profile | Lab

Our goal is to identify genes causing insulin resistance in humans in order to find new therapeutic targets for diabetes and cardiometabolic diseases. Our approach to discovery is grounded in human genetics, clarified through systematic, high throughput experimentation in human cells, and calibrated by its relevance to clinical disease. We use massively parallel genome engineering to re-create mutations identified in patients and develop high-throughput assays to interrogate function in human cell models. We apply bioinformatics and statistics to make sense of this data integrating 1) human mutations, 2) cellular function, and 3) metabolic/glycemic phenotypes of the individuals who harbor them. Using this approach, we have discovered novel missense mutations that greatly increase risk for type 2 diabetes. As a complementary aim towards precision medicine, we develop tools for clinical genome interpretation powered by high-throughput experimental data.

  • Evaluating accuracy and clinical utility of commercially available genetic risk scores for diabetes

    Last Updated:

    Recently, 23andMe, which sells direct to consumer genetic testing products, has introduced a diabetes risk report based on single nucleotide polymorphisms (SNP) genotypes measured in their commercial product ($199: https://www.statnews.com/2019/03/10/23andme-will-tell-you-how-your-dna-affects-your-diabetes-risk-will-it-be-useful/). The clinical utility of this report is unclear and has generated significant controversy. Critically, 23andMe’s SNP-chips only test about 0.02% of the human genome. We have shown in previous work that a single rare SNP, not captured by SNP-chips, can change an individual’s risk of diabetes by 7-fold. The purpose of the project is to test the 23andMe diabetes report output in a dataset of individuals whose diabetes status is known and who have also undergone more extensive genome sequencing (whole exomes) to assess the accuracy of direct to consumer SNP tests and quantify the number of falsely reassuring tests when more complete genetic information is considered.

  • Identifying discriminators of drug-responsive mutations in Mendelian diabetes

    Last Updated:

    Loss-of-function mutations in hepatocyte nuclear factor 1 (HNF1A) cause autosomal dominant diabetes of the young (MODY3). Patients with MODY3 clinically are difficult to distinguish from patients with autoimmune type 1 diabetes and are therefore often given the same treatment consisting of multiple daily injections of insulin. However, MODY3 patients can be effectively treated with a single daily tablet of sulfonylureas and thus spared from having to take multiple daily injections. This project aims to utilize data generated from cells engineered to express a range of HNF1A mutations (MODY3 and non-MODY) followed by RNA-sequencing to identify a signature of genes that can distinguish between sulfonylurea responsive mutations and non-responsive mutations. This transcriptomic signature would form the basis of a biomarker test in patients with HNF1A mutations to predict their responsiveness and provide the most effective, least burdensome treatment.

  • Integrative genomics to identify a novel disease-causing mutation in the Simpson Golabi Behmel Syndrome (SGBS)

    Last Updated:

    The SGBS syndrome is characterized by overgrowth of multiple body parts. It is a rare genetic disease that has been attributed to inactivating mutations in GPC3. We have stem cells from a patient with SGBS syndrome but NO GPC3 mutation implicating another as yet unknown causal gene. We have performed whole genome sequencing and RNA sequencing on these cells. The goal of this project is to identify the causal gene utilizing the genomic data sets to create a “short list” of causal genes which then can be assessed experimentally in the patient cells using genome engineering.

| Biomedical Informatics

machado@ucsd.edu | Profile

| Bioengineering

yiw015@eng.ucsd.edu | Profile | Lab

Our research focuses on molecular engineering for cellular imaging and reprogramming, and image-based bioinformatics, with applications in stem cell differentiation and cancer treatment.

  • Image-based reconstruction of biochemical networks in live cells

    Last Updated:

    Fluorescence resonance energy transfer (FRET)-based biosensors have been widely used in live-cell imaging to accurately visualize specific biochemical activities. We have developed the Fluocell image analysis software package to efficiently and quantitatively evaluate the intracellular biochemical signals in real-time, and to provide statistical inference on the biological implications of the imaging results. However, important questions arise on how to use these results to reconstruct the quantitative parameters in the underlying biochemical networks, which determine cellular functions and ultimately their fates. In this rotation project, we will integrate optimization-based machine learning approaches with biochemical network models to seek answers to these questions, with applications in cancer treatment against drug resistance.

  • Intelligent Diagnosis of Infectious Diseases by Deep Learning

    Last Updated:

    The diagnosis of infectious diseases often requires tissue biopsy and microscopic examination by pathologists, which is time-consuming, labor-intensive, and error-prone. To develop a software-assisting system for identifying microorganisms on digital images, we utilize the convolutional neural network and transfer learning for training and validating an intelligent software system for the classification of pathology slides. The goal of this project is to provide a diagnosis of pathogens with high efficiency and accuracy. Students will work in an interdisciplinary team, collecting and labelling imaging data, developing deep-learning based algorithms and user interfaces, characterizing and optimizing the accuracy and functionality of the software package.