SMP BIOINFORMATICS.
Coordinator:
Prof. Dr. Roland Eils
Summary:
The primary remit
of SMP-Bioinformatics is to facilitate the
maximized utilisation and 'metamorphosis' of data
flowing
from the
NGFN2 initiative into expressions of
biomedical knowledge. By providing the
infrastructure and know-how necessary for this 'data
to knowledge' transfer, we will play a key role in
ensuring the success of NGFN2 as a whole. Our
approach follows a concept of both centralized and
decentralized data organization, management,
analysis, interpretation and training. In so doing,
we not only provide local support for the
administration and analysis of experimental data,
but also provide a complementary systematic
platform with centralized knowledge, tools and
resources. In terms of generic organization, SMP
Bioinformatics is founded on three distinct,
albeit interdependent components. Together, these
elements allow a so-called '3i-approach' to the
data to knowledge transformation within the NGFN2
project =>1) Integrate, 2) Interpret and 3) Inform.
Figure 1. Image map showing sub-projects of the SMP bioinformatics
and their interactions.
1. Integrate:
The first component,
'Data management', is responsible
for the design, development, and provision of
databases and standards for diverse data types.
One of the primary remits is to accelerate the
research pipeline of KGs by providing an
appropriate database infrastructure and data
integration tools, to facilitate the efficient and
integrated management of multi-center research
networks.
2. Interpret:
The second
component, 'Data analysis', seeks
to expand the focus beyond methods development for
microarray analyses, to a more robust and
comprehensive 'omic'-analysis suite. Important
here will be the ability to meaningfully interpret
data collected for the same clinical phenomenon on
disparate technology platforms. With a network of
information about the relationships between genes/proteins
rapidly emerging, the analysis of functional data
will therefore be facilitated within a functional
context. This will include methods for vertical
analyses across different genomic information
levels and methods based on secondary network
information. Further attention will be paid to the
annotation and interpretation of results obtained
by these methods, with the goal of producing
high-quality primary annotations. To this end,
data must be projected on functional modules, such
as protein complexes, co-regulated genes,
functional units such as mitochondria or ribosomes,
metabolic and regulatory pathways. While this
concept has been successfully applied to model
organisms such as yeast, the comprehensive data
resource to allow for the computational projection
and interpretation of experimental data has yet to
be developed. This will be achieved in our SMP by
using homology-based methods for gene function
annotation combined with advanced protein
interaction prediction methods as well as
benchmarked information transfer protocols.
3. Inform:
The third and final
component, 'Consulting and Services',
is responsible for the dissemination of methods,
systems and know-how from the SMP Bioinformatics
to other SMPs and KGs. Methods developed within
the SMP Bioinformatics will be either distributed
through the open-source toolbox system 'Bioconductor'
or in ready-designed application specific
workflows, through our proprietary process
oriented data-mining platform MINE-IT. Existing
know-how on study design, data analysis and data
management will be disseminated through regular
training courses, an extension of the very
successful courses offered in NGFN1. Guidance to
SMPs and KGs to achieve high quality in all
aspects of an experiment will be provided,
focusing on methodological quality and consulting.
This subproject will be responsible for the
definition, communication and application of
methodological guidelines. Finally, comprehensive
bioinformatics services on methods and
technologies represented within the present
consortium will be provided by our consortium.