Integration and Exploration of Biological Networks
Hendrik Mehlhorn, Falk Schreiber
Daten in den Lebenswissenschaften: Datenbanken als Kommunikationszentrum at INFORMATIK 2011 - Informatik schafft Communities
Berlin 2011
Berlin 2011
Abstract: Biological networks play a crucial role in solving complex biological problems in the life sciences. Modern wet lab techniques
such as GC/MS, multidimensional protein gels, and microarrays produce a continuously increasing amount of biological data sets.
These arise from different domains such as metabolomics, proteomics, and transcriptomics. In order to achieve aims of life science
projects such as the invention of new drugs or the increase of yield in crop plants, a deep understanding of the complex
interactions of biological entities from different domains is necessary. However, the integration of the underlying biological
networks is a task which is yet not solved satisfactory due to the lack of conventions and ambiguities.
There are many databases which integrate data from different sources. However, these databases are often limited to a few organisms
or data domains, and a comprehensive view on integrated biological networks is therefore not (or only partly) possible. As a result
specific analyses have to be done independently on basis of available data sets and common biological networks. In order to integrate
biological networks from different sources and different domains, an identifier mapping has to be done to infer corresponding and
related entities of different biological networks, and exploration methods have to be provided to support the investigation of the data.
We present methods and an easy to use prototype (based on the Vanted system) for the integration and visualization of biological
networks via utilizing various data sources. The idea is to employ biological network data from different sources and from different
domains such as metabolic pathways, protein-protein interaction networks, signal transduction pathways, and gene regulatory networks.
The identifier mappings arise from an easily extensible set of integrated IDMapper's, which are managed by the identifier mapping
framework BridgeDB. An IDMapper is a generic mapping information source which can be implemented in several shapes such as web
services, SQL databases, or flat files. The manifold IDMapper implementation possibilities as well as their easy integration enable
fast and adaptive extensions of the tool to further demands. The set of all identifier mappings constitute the identifier mapping graph.
A powerful management of the identifier mapping graph including the handling of identifier synonyms and transitive identifier mapping
paths afford a flexible integration of biological networks. The integrated biological networks can be visualized completely or
partially according to various filtering operations. Via a targeted mapping and filtered visualization of integrated biological
networks, the user is able to prepare custom systems biology analyzes and publication ready figures.