Mihai's Research

Below is a list of several ongoing projects in my lab. Note that not all projects are necessarily active or current. They all represent interest research of mine and work in these areas depends on the availability of funding, time, and "able bodies". For software packages developed as part of this research see our Software page.

Note that I do not do research on machine learning, statistics, or graphics, though I use tools from these fields on occasion. My primary interests relate algorithms for processing strings (pairwise alignment, and multiple alignment of DNA or protein sequences) and graphs (uncovering interesting patterns in assembly graphs). I am also very interested in graph drawing and in software testing for bioinformatics/scientific applications.

Some of these projects provide opportunities undergraduate research, either as summer projects or as part of the CS honors program. For more information on how to apply for such research opportunities see our Undergraduate Programs page.

Metagenomics
Metagenomics is a new scientific field that is targeted at the analysis (through high-throughput genomic technologies) of microbial communities that inhabit our bodies and our world (see our overview of metagenomics for more information). Our current research in metagenomics is primarily focused on Metagenomic assembly and whole-metagenome analyses with a particular focus on understanding the genomic variation within microbial communities; and the development of comparative methods that enable the analysis of clinical data-sets (generally comprised of hundreds to thousands of samples), see our comparative packages Metastats and MetaPath.

In addition, we are involved in several metagenomic projects analyzing the microbial, viral, and parasitic communities that cause diarrhea in third-world children; and analyzing the role microbes and viruses play in lung disease in HIV-infected patients.

One of our long-term goals in this field is to develop predictive models of microbial communities that will enable biologists to simulate community dynamics in order to better understand the effects of treatment or other external factors on health.
Genome assembly
My work on genome assembly is currently primarily focused on the assembly of metagenomic data, with a particular focus on uncovering genomic variation within the assemblies.

Also, we are interested in developing approaches for the validation of genome assemblies, in particular de novo validation approaches that can assess the quality of assemblies in the absence of a 'golden truth'.

Motivating applications

The research performed in my lab is motivated and driven by real biological applications. In addition to doing basic research and writing software, the researchers in the lab work to analyze real biological datasets generated by our collaborators. Here are some examples from among the many projects we are and have been involved with:

Diarrhea and malnutrition in the developing world. Together with colleagues from the University of Maryland School of Medicine and the University of Virginia we are exploring the role of the human gut microbiota in the etiology, prevention, and treatment of moderate to severe diarrhea and malnutrition in children under the age of 5 from developing countries.
CONSERVE. This project is led by the University of Maryland School of Public Health and it brings together a diverse research team with the goal of exploring the safe use of non-traditional water sources (such as reclaimed wastewater, surface water, etc.) for the irrigation of crops that are eaten raw. This is a very exciting project given the impact climate change (yes - it's real) has on our precious water resources.
Got Flu?. Also led by the School of Public Health is a project aimed at understanding how flu spreads from person to person and to understand how social interactions, the genetics of the flu virus itself, and the response our body to infection influence the ability of flu to spread through a population.
The Genome of the Diamondback Terrapin. We get to sequence our mascot and Duke can't. Hah!

Other research

In addition to the major research interests outlined above, my lab is also working on several other topics:

High performance/throughput computing in bioinformatics
New DNA sequencing technologies are generating large amounts of data at significantly higher pace than possible just a few years ago. The analysis of new generation sequencing data poses significant computational challenges, both due to the sheer size of the data-sets being analyzed and due to individual characteristics of the new sequences. We are currently conducting research to evaluate whether highly-parallel computing clusters can be used to efficiently analyze such data, with the goal of providing researchers with the ability to rent CPU cycles rather than have to implement and maintain an expensive computational infrastructure in their labs. We are primarily focused on algorithms for sequence alignment (see Crossbow) and for genome assembly (see Contrail.

For more information check out our High Performance Computing page.
Understanding antibiotic resistance
We created a database of all information we could easily extract from literature and other public databases - ARDB. This database is freely available to all scientists both through the web as well as a flat-file download from ftp://ftp.cbcb.umd.edu/pub/data/ARDB.
Prokaryotic genome annotation
Together with colleagues at the NMRC we have developed a modular prokaryotic annotation pipeline, primarily for use in various genome projects we are involved in, but also as a framework for exploring research questions regarding the functional annotation of genomes and metagenomes. The software is available, open-source, from http://sourceforge.net/projects/diyg.
Semi-automated genome finishing
This research is an unexpected result of our research on genome assembly and on incorporating new types of data in the assembly process. We noticed that mate-pair information, optical mapping data, as well as other information generated during the genome assembly process (specifically assembly graphs) can be used to improve genome assemblies (e.g. by resolving certain classes of repeats) as well as to guide the design of experiments aimed at finishing genomes. We have successfully applied some of these ideas to the finishing of Aggregatibacter aphrophilus and Vibrio harveyi, and we are currently in the process of finishing Yersinia rohdei and Yersinia ruckeri.