2.1 Regulation 2.2 Data analysis 2.3 Networks 2.4 Gene sets 2.5 Annotation
2.4. Gene Sets


Responsible: Dr. Sabine Dietmann, GSF, Munich.

Background:

Genome sequence information reveals little of functional significance when viewed in isolation. This static snapshot can only be brought to life by understanding the dynamic aspects of genome behavior such as TF-binding, epigenomics, and other structural changes. No systematic analysis of the functional dependencies amongst genes and their correlation to expression profiles has yet been published. The aim of this subproject is therefore to use information generated in other subprojects  to explore their correlation to expression profiles. Any set of genes can be represented as a network consisting of interconnected subnets or clusters equivalent to functional modules. Interaction networks are logically but not mechanistically linked to transcript regulation. Identification of disturbed interactions with respect to transcription control (e.g. uncoupling of cross-correlating, co-regulated genes) in samples from the clinical networks may provide a rational interpretation of the disease mechanisms (e.g. rheumatic arthritis, infection and inflammation).

Planned work:

  1. GenSet Collection: Diverse datasets will be generated from functional classification, regulatory information, metabolic and regulatory pathways, protein/protein interactions and comparative approaches. Application of clustering algorithms will return a set of putative GenSets representing functional modules.
  2. Mapping algorithms: Given the graph of candidate clusters as generated in (1), high-throughput data will be systematically screened for cross-correlation of co-expressed genes with genes linked within the cluster candidates. Vice versa, sets of co-expressed genes will be mapped to candidate clusters.
  3. Optimization of parameters: For any gene, a functional neighborhood can be described by a set of relations. To quantify any possible relations, a quantitative description of the functional distance is required. Since many of the values are missing (unknown relations), the gene sets will be dynamic. Optimization by updating gene sets as well as optimization of parameters.
  4. Comparision of independent experiments: Here we aim to scan public as well as NGFN array data to correlate detected patterns to GeneSets. These patterns will be linked to the experimental condition, e.g. tumor type. Thus, beyond the use of signals as markers for disease states, additional information will be available for the interpretation of the differences between disease and homöostatic conditions.
  5. Confirmation of module dependencies: Based on our bioinformatic analysis, we will interact with experimentalists to test hypothetical functional assignments based on our GenSet approach. In particular, within the SMP model organisms, the functional analysis of mouse mutants offers a sensible approach for the confirmation of hypotheses derived from data analysis, by reducing the experimental space significantly to a relatively small number of testable conditions.