Wednesday 08.11 | Thursday 09.11 | Friday 10.11 | |
9:00 - 10:00 | Registration | M. PEYRARD | B. JACQ |
10:00 - 11:00 | A. ARNEODO | A. SCIARRINO | C. HERMANN |
11:00 - 11:30 | Coffee break / Discussion | Coffee break / Discussion | Coffee break / Discussion |
11:30 - 12:30 | C. THERMES | B. PRUM | M. CASELLE |
12:30 - 14:30 | Lunch | Lunch | Lunch |
14:30 - 15:30 | B. AUDIT | L. PALMEIRA | A. GORBAN |
15:30 - 16:00 | Coffee break / Discussion | Coffee break / Discussion | A. ZINOVYEV |
16:00 - 16:30 | E. PECOU | R. TWAROCK | |
16:30 - 17:00 | Coffee break / Discussion | ||
17:00 - 18:00 | H. ORLAND | Round Table |
DNA in chromatin: what can we learn from a multi-scale wavelet analysis of DNA sequences ?
Recent technical progress in live cell imaging have confirmed that the structure and dynamics of chromatin play an essential role in regulating many biological processes, such as gene activity, DNA replication, recombination and DNA damage repair. The emerging view is that genomes are compartimentalized into subchromosal structures that likely coordinate the spatial organization and timing of replication and transcription. Remarkably these structures can persist throughout subsequent cell generations. As regards to this probable structural stability, it is essential to determine to which extent this organization of the higher order structure and dynamics of chromatin directly results from the primary DNA sequence and its functional landmarks. In the first part of this talk, we use the space-scale decomposition provided by the continuous wavelet transform (WT) to characterize the scale invariance properties of genomic sequences. We show the existence of long-range correlations (LRC) over distances up to 20-30 kbp. To understand to which extent the observed LRCs could influence the compaction and accessibility of genomic information in the cells, we perform a fractal analysis of DNA structural profiles e.g. DNA bending profiles based on nucleosome positioning data. By exploring a number of genomes from the three domains of life, we identify a characteristic scale of 100-200 bp separating two scale-invariant regimes. In the small-scale regime (10-200 bp), LRCs are actually observed in eukaryotic genomes, in contrast to their total absence in eubacterial genomes. This result together with the use of several viral control sequences suggests that LRCs in the 10-200 bp range are a multi-scale signature of the eukaryotic nucleosomal structure. In the large-scale regime (200-20 kbp), LRCs are universally observed. Following the previous interpretation, we conjecture that these correlations might also be essential to the condensation/decondensation of the chromatin fiber. In a second part, we explore the large-scale compositional heterogeneity of several large (tens of megabases) contigs within human chromosomes through the optics of the WT microscope. We show that the GC content displays relaxational nonlinear oscillations with two main frequencies corresponding to 100 kb and 400 kb which are well recognized characteristic sizes of chromatin loops and loop domains involved in the hierarchical folding of the chromatin fiber. These frequencies are also remarkably similar to the size of mammalian replicons and replicon clusters. When further investigating deviations from intrastrand equimolarities between A and T and between G and C, the observation of these two fundamental frequencies suggests that they are the footprints of the replication and/or transcription mutation bias. We further show that the observed nonlinear oscillations enlighten a remarkable cooperative organization of gene location and orientation.
DNA in chromatin: Modelling the influence of the sequence on the structure and dynamics of chromatin.
The structure and dynamics of chromatin in the cells must guarantee coordinated access to their DNA targets of DNA-binding machineries that are required for gene transcription and DNA replication. Sequence analysis reveals important clues to understand to which extent the primary DNA sequence contribute to structural and topological modifications participating in these fundamental cellular processes. We present two models of DNA and chromatin respectively that include information extracted from the analysis of DNA bending profiles (see A. Arneodo presentation) and DNA compositional asymmetry (see C. Thermes presentation) and that provide some insight on the physical mechanisms by which DNA sequence could influence the structure and dynamics of chromatin at the nucleosomal scale (~200bp) and the fibre loop scale (>100 kbp) respectively.
In a first part of this talk, we present some results on the analysis of the influence of the sequence on the elastic properties of naked DNA. This analysis allows us to propose some understanding of the LRCs observed in eukaryotic genomes in terms of the physical processes underlying the formation, the positioning and the dynamic of nucleosomes along genomic DNA. Since within nucleosomes DNA forms approximately two co-planar loops, we analyse the thermodynamics of small 2D DNA loops as a simplified model for the mechanics of nucleosomal DNA. In this model, the presence of LRC structural disorder has a strong influence: (i) the energy of small loop formation decreases all the more the LRC strength is increased; (ii) in the presence of LRCs, the loop thermal mobility presents a superdiffusive dynamics on short time scale. These results demonstrate that LRCs have the capacity to favour nucleosome formation and positioning in vivo. Furthermore, we compare the model predictions to genome-scale nucleosome positioning data recently obtained by Yuan et al. for S. cerevisiae chromosome III (Science 309, 2005). The statistical analysis of the experimental profile of nucleosome occupancy displays striking similarities to the energy landscape of nucleosome formation computed from the sequence. These results constitute a first experimental evidence of the influence of LRCs on the nucleosomal organisation.
In a second part, we discuss the consequence of the peculiar gene organisation observed around mammalian replication origins (ORIs) suggesting that locally around ORIs the chromatin fibre is likely to present condensation defects. In a crowded environment of macromolecules (proteins and nucleic acids) such as in the eukaryotic nucleus, the presence of inhomogeneity in the nucleosomal array (possibly induced by the sequence) could predispose the chromatin fibre to spontaneously form rosette-like structures: multi-leaved rosettes might self-organize from the entropy-driven assembling of neighbouring defects into clusters by depletive forces. Very attractively, this model provides a universal physical mechanism accounting for the first step of rosette organization prior to any specific interaction with nucleoproteic complexes involved in replication and transcription processes.
Detection of miRNA target genes through statistical analysis of DNA motifs in human-mouse 3'-UTR regions.
MicroRNAs are a family of ~22 nt endogenous small RNA that negatively regulate gene expression at a post-transcriptional level, in a wide set of organisms, including humans, and in several genes for each organism. They act probably either by decreasing the stability of the target mRNA or by translational inhibition. At present, ~300 miRNA are reported in the human genome, whereas, in human, more than 10% of genes are thought to be under control of miRNA. In this talk, I will present a new simple computational framework for the identification of genes whose 3'-utr regions are candidate miRNA targets. The method is based only on the statistical distribution of DNA motifs, does not require any a priori assumption and does not rely on sequence-alignment procedures.
Informational disassembling of biological machines: What proteins are made from?
What are proteins made from? Everybody knows that they consist from amino acids. But what are machines made from? Are they made from atoms or from details? They consist of details, of course. To consider proteins as the main engineering material of living cells, we need to understand what the details of these machines are. We propose the informational approach for machines disassembling. The general idea of this approach is: if we guess the proper picture of details combination, it should look maximally non-random out. First, we apply this idea to find the most elementary functional details. Those are classes of amino acids. To obtain the minimal families of amino acids necessary for living cell construction, we need find, first, optimal amino acid classifications. From informational point of view, this classification transforms proteins into maximally non-random sequences. The optimal classification depends on the protein family. We present optimal cl! assifications for various families and compare them to know physical classifications. For example, for membrane proteins, the optimal classification is close to Hydrophobic/Polar classification, while for globular proteins it is quite far from that and differs significantly from all well known classifications. The examples from natural language analysis are presented too. Joint work with Mikhail Kudryashev and Tatiana Popova, Institute for Computational Modeling, RAS, Krasnoyarsk, Russia.
Conserved non-coding sequences in Drosophila: no junk... but what else ?
After decades of being considered as "junk-DNA", non coding sequences in genomes are gaining increasing interest, as it is now widely admitted that they contain functional regions: non-coding RNAs, control regions, etc... We will present a work devoted to the precise identification of conserved non-coding sequences between several drosophila species, in order to precisely investigate the relationship conservation/function/evolution, on the example of enhancer regions. Do these conserved regions necessarily correspond to functional sites ? What about functional sites that have diverged between drosophila species ?
Analyse de réseaux d'interaction protéine-protéine: vers une classification fonctionnelle du protéome.
Theoretical approaches for the study of dinucleotide content in genomes: some asymptotic results.
Dominant folding pathways in protein folding.
A mathematical model for copper homeostasis in Enterococcus Hirae.
On the basis of the extended experimental work of M. Solioz and co-workers on the biomolecular basis of copper homeostasis in the bacterium E. Hirae, we built a dynamical model accounting for the adaptation of the cell to the changes in the external environment. Copper is an essential nutrient, but toxic in excess. Its uptake into and release outside from the cell, as well as its transport inside the cell, are tightly regulated at the genetic level through the cop-operon. It encodes two P-type membrane ATPases, a chaperone and a negative transcriptional self-regulator. This regulator is inhibited by copper ions. Although some simplifications have been made when building our model, some biologically relevant questions with (biologically relevant!) answers can be address. For example: existence and unicity of an equilibrium state, critical parameters for homeostasis, stability of homeostasis, etc. Together with numerical simulations, we use the qualitative theory of differential equations to derive our results.
Can we predict DNA biological activity from the study of its local fluctuations ?
DNA dynamics is essential for its biological function. The genetic code could not be read without a local unwinding of the double helix, and large openings, the so-called ``DNA bubbles'', are supposed to allow the formation of some specific DNA structures, such as the T-loop that stabilizes the end of the chromosomes.
Mesoscopic DNA models give a fairly accurate description of the thermal denaturation of DNA, i.e. the separation of the two strands by heating, and they predict the existence of localized fluctuations which are reminiscent of the ``breathing'' of the double helix observed by biologists.
Thus it is tempting to try to use these models to predict the biological activity of DNA. It has been speculated that the formation of bubbles of several base-pairs, due to thermal fluctuations, are indicators of biologically active sites. Comparison between molecular dynamics simulations of the PBD DNA model and experiments suggest that it could be the case, but this observation is however difficult because large bubbles appear only seldom so that the statistical significance of the results can be questioned. We introduce a new method, that is orders-of-magnitude faster than molecular dynamics to analyze these bubbles and show that presently the PDB model is not yet able to detect biologically active sites [1].
This does not imply that DNA fluctuations are not signs of the biological meaning of some sections of the genetic code, but could mean that the model is not yet able to properly relate the local opening and the base-pair sequence. In order to improve it, a comparison with experiments measuring the local fluctuations of DNA as a function of its sequence is necessary. We discuss such experiments and introduce some improvements of the model to bring it closer to the goal of predicting biological activity of DNA from physical studies of a highly simplified model.
[1] Titus S. van Erp, Santiago Cuesta-Lopez, Johannes-Geert Hagmann, Michel Peyrard, Can one predict DNA Transcription Start Sites by studying bubbles?, Phys. Rev. Lett. 95, 218104 (2005).
Analysis of biological sequences using Markov Chainsand Hidden Markov Models.
A mathematical model of the genetic code: structure and applications.
Replication-associated strand asymmetries in mammalian genomes: in silico detection of replication origins.
During the course of evolution, mutations do not affect equally both strands of genomic DNA. This mainly results from asymmetric DNA mutation and repair processes associated with replication and transcription. In prokaryotes, prevalence of G over C and T over A is frequently observed in the leading replicating strand. The sign of the resulting TA and GC skews changes abruptly when crossing replication origin and termination sites, producing characteristic step-like transitions. In mammals, transcription-coupled skews have been detected but no bias had been associated with replication. In a first part we present the analysis of intergenic and transcribed regions flanking experimentally identified human replication origins, demonstrating the existence of compositional strand asymmetries associated with replication. Wavelet-based multi-scale analysis of human genome skew profiles reveals numerous transitions allowing us to identify a set of one thousand putative replication initiation zones. Around these putative origins, the skew profiles display a remarkable pattern also observed in other mammalian genomes. Based on these results we propose a model of the mammalian replicon where termination sites are randomly distributed between adjacent origins. In a second part, we examine the organisation of the human genes around the replication origins. We show that replication origins, gene orientation and gene expression are not randomly distributed but on the opposite are at the heart of a strong organisation of human chromosomes that will be discussed in the perspective of chromatin structure and dynamics.
The packaging structure of the viral genome and its role in virus assembly.
Codons, genes and networks: multiple scales in genome sequence organization.
The talk will consist of three parts in which I am going to overview three recent results on the large-scale analysis of the genomic sequence structure. Every part will highlight one specific property of genomic sequence at a certain scale. The first part will describe the notion of the codon bias vector field and its application to the automatic calculation of Codon Adaptation Index in different organisms. In the second part I will overview our results on the analysis of the universal 7-cluster genome structure of bacterial genomes. In the last part I will tell about our recent result on the estimation of the minimal quantity of non-coding RNA which is needed to support accelerated gene regulation networks in higher eukaryotes.
Molecular mechanic studies of the sequence-dependent bending deformability in DNA and RNA.
Many biological functions of DNA and RNA are mediated by interactions with proteins and other biomolecules which can induce local as well as global deformations (e.g. bending) in the nucleic acid. During this part of a PhD project related to the structural and energetic characterization of axis bending in nucleic acids, we have developed an original algorithm to restrain the curvature of a fragment of DNA or RNA based on a ?screw axis description? of the double helix. Potential energy calculated with JUMNA for successive windows of increased bending amplitude has been correlated with conventional description of curvature in nucleic acids used in the analyzer CURVES. Energy minimization studies using Parm98 force field6 combined with generalized Born (GB) model have been performed on the pair of Hagerman sequences and on two papillomavirus E2 protein binding sites, which are typical examples of similar sequences presenting asymmetric behavior in their specific curvature, and to various A-tract containing oligonucleotides. Results indicate good qualitative agreement with available experimental data and offer atomic level explanations for the bending deformability of nucleic acids (in ref. to 8,9,10,11). The substantial decrease in computation time when using this approach in Jumna provides a useful tool for current systematic studies of sequence-specific nucleic acids bending. For instance a 3d map of potential energy curves as the bending plan is turned around the helical axis through 360° can be generated in one day. A molecular dynamics implementation is in progress.
p-Adic model of the genetic code.
Using some basic properties of p-adic numbers, particularly
5-adic integers, a simple p-adic model of the genetic code was formulated
recently. The information space of codons is introduced by assigning 5-adic
positive integers to them. We found that genetic code degeneracy is related
to the p-adic distance between codons: p-adically close codons correspond
to the same amino-acid. This talk is based on the paper? A p-Adic Model of
DNA Sequence and Genetic Code?
Available at arXiv: q-bio.GN/0607018 and some new results.
Jeux, équilibres et réseaux de régulation de gènes.
La théorie des jeux peut être vue comme la théorie des équilibres. Comme elle est un peu plus que cinquantenaire, il peut sembler opportun de la réexaminer. C'est ce que nous faisons en proposant une nouvelle vision plus générale que nous appelons "jeux à conversion-préférence", en abrégé "jeux CP". Ce formalisme semble s'adapter agréablement aux réseaux de régulation de gènes.
DrosOCB: a database of Drosophila conserved non-coding blocks.
The current availability of multiple sequenced Drosophila
genomes allows us to apply comparative genomics methods to collect
conserved elements systematically. Sequence conservation in non-coding
regions reflects common functional sites, like cis-regulatory modules or
transcribed non-coding sequences. Motivated by this assumption, we have
generated a catalogue of conserved non-coding blocks (CNBs) between D.
melanogaster and D. pseudoobscura, performing pairwise local alignments of
intergenic regions flanking orthologous genes. Due to our "gene-centric"
view, we compare surrounding regions of orthologous gene pairs, without
requiring a syntenic order in the two genomes. CNBs are obtained with CHAOS
local alignment algorithm [1], first using quite restrictive parameters and
then with more sensitive ones, providing two versions of the database. This
algorithm for rapid identification of local similarities is able to point
out reshuffled and not perfectly conserved sequences typical of non coding
DNA. This database will be extended with pairwise alignments of more
drosophila species when reliable annotations will be available.
References:
[1] Brudno M. et al., BMC Bioinformatics 2003.
[2] Corà D. et al., BMC Bioinformatics 2005.
An evolutionary model for Turing machines.
We present computer simulations of an evolutionary model for Turing machines. We start with an initial population of 300 1-state Turing machines and let them evolve for 200000 generations. At each generation every Turing machine undergoes three processes: mutation (with a probability p1), state-increasing (with a probability p2) and selection and reproduction (measuring the relative fitness of the machines according with a properly specified task). The states of the final population are divided into two classes: "introns" (states that can be changed without altering the fitness) and "exons" (the other ones). We study how the fitness and the exons/introns ratio vary changing the values of the probabilities p1 and p2.
Spatial transcriptomic patterns in the genome of the endosymbiotic bacterium Buchnera aphidicola.
Past genomic studies have widely described the organization of
the chromosome in bacteria, for example in terms of gene localization,
order and orientation. The degree of organization seems to increase with
the genome size, the overall GC composition and the number of
nucleoid-binding proteins. Moreover, recent transcriptomic analyses have
revealed a relationship between gene expression levels and chromosomal
organization in free-living bacteria. In this study, this question was
analysed in the highly reduced genome of Buchnera aphidicola, the primary
endosymbiont of the aphids. Gene expression levels were measured with a
dedicated oligo-array containing the 617 ORF of Buchnera and data were
normalized by genomic DNA signals in order to obtain absolute gene
expression values. Using autocorrelation functions and Fourier transform
techniques, we found the existence of significant periodic transcriptional
patterns in Buchnera. Moreover, by permuting gene position along the
chromosome, we showed that these spatial patterns in gene expression are
probably dependent on the conservation of operon structures in the genome
of Buchnera, but also due to operon location and their spacing to each
other.
Key words: Gene expression - chromosomal organization - spatial patterns - Buchnera aphidicola.