October, 16th : Groupe de travail de l’Axe Apprentissage Statistique et Processus, Unité MIA-T, INRA de Toulouse, A short introduction to statistical learning.
Networks are very useful tools to decipher complex regulatory relationships
between genes in an organism. Most work address this issue in the context of
i.i.d., treated vs. control or time-series samples. However, many data sets
include expression obtained for the same cell type of an organism, but in
several conditions. We introduce a novel method for inferring networks from
samples obtained in various but related experimental conditions. This approach
is based on a double penalization: a first penalty aims at controlling the
global sparsity of the solution whilst a second penalty is used to make
condition-specific networks consistent with a consensual network. This
"consensual network" is introduced to represent the dependency structure between
genes, which is shared by all conditions. We show that different "consensus"
penalties can be used, some integrating prior (e.g., bibliographic) knowledge
and others that are adapted along the optimization scheme. In all situations,
the proposed double penalty can be expressed in terms of a LASSO problem and
hence, solved using standard approaches which address quadratic problems with
L1-regularization. This approach is combined with a bootstrap approach and is
made available in the R package therese. Our proposal is illustrated
on simulated datasets and compared with independent estimations and alternative
methods. It is also applied to a real dataset to emphasize the differences in
regulatory networks before and after a low-calorie diet.
Networks are very useful tools to decipher complex regulatory relationships
between genes in an organism. Most work address this issue in the context of
i.i.d., treated vs. control or time-series samples. However, many data sets
include expression obtained for the same cell type of an organism, but in
several conditions. We introduce a novel method for inferring networks from
samples obtained in various but related experimental conditions. This approach
is based on a double penalization: a first penalty aims at controlling the
global sparsity of the solution whilst a second penalty is used to make
condition-specific networks consistent with a consensual network. This
"consensual network" is introduced to represent the dependency structure between
genes, which is shared by all conditions. We show that different "consensus"
penalties can be used, some integrating prior (e.g., bibliographic) knowledge
and others that are adapted along the optimization scheme. In all situations,
the proposed double penalty can be expressed in terms of a LASSO problem and
hence, solved using standard approaches which address quadratic problems with
L1-regularization. This approach is combined with a bootstrap approach and is
made available in the R package therese. Our proposal is illustrated
on simulated datasets and compared with independent estimations and alternative
methods. It is also applied to a real dataset to emphasize the differences in
regulatory networks before and after a low-calorie diet.
April, 25th: Séminaire de l’unité MIA-T, INRA, Toulouse, Inférence conjointe de réseaux de co-expression : Consensus LASSO.
The recent development of high-throughput techniques makes available huge
datasets where thousand genes are simultaneously measured. However, the number
of observations is, comparatively, very small, and those are often measured in
a variety of experimental conditions. One of the big challenge of modern
systems biology is to understand the influence of controlled experimental
conditions on the functioning of living organisms. This question is usually
addressed by searching for the difference between gene expressions pertaining
to the condition (hence for "differentially expressed genes"). But the
differences in the way the genes interact with each others is also a question
of interest: finding which regulation pathways are modified by a given
experimental condition gives an interesting insight on the influence of the
condition on the living system in its whole. One of the most popular approach
to understand the complex relationships existing between the expression of a
large set of genes is to infer a co-expression network from a transcriptomic
dataset. In such a model, the nodes of the network represent the genes and an
edge between two nodes models a strong co-expression between the two genes. A
number of different methods have been developed to infer such networks: using
correlations (relevance network, Butte & Kohane, 2000), Bayesian networks
(Pearl, 1998 or Pearl & Russel, 2002), Graphical Gaussian Model (Edwards,
1995)... When the observations have been collected in different conditions, a
naive approach would be to infer a network for each experimental condition and
to compare them. However, this method will not be able to stress out
specifically the differences and the commonalities of regulation phenomenons:
since the number of observations is small, inferring the networks
independently, forgetting that a common functioning should exist whatever the
condition will lead to emphasize irrelevant differences. In this proposition,
we will present a novel method for inferring co-expression networks from
samples obtained in different experimental conditions. This approach is based
on a double penalization: a first penalty aims at inferring a sparse solution;
then, the second penalty is used to make the networks obtained in different
conditions consistent with a consensual network. The "consensual network" is
introduced to represent the dependency structure between genes, the common
functioning of the living organism under study, whatever the condition. The
estimation is made more robust by using a bootstrap approach. Our proposal is
tested and compared to existing alternatives, on simulated datasets,
investigating the influence of the number of different edges between
conditions and of the sample size. It is also applied on a real-world dataset
where the transcriptom has been measured for different breeds of a given
mammalian species.
Cet exposé introduira la notion de réseaux et les problématiques élémentaires
qui y sont généralement associées (visualisation, recherche de sommets
importants, recherche de modules). Les notions seront illustrées à l'aide
d'exemples utilisant le logiciel R sur un réseau réel.
In French21 mars : lycée Jules Fil, Carcassonne, dans le cadre de La Quinzaine des Maths, Méthodes de sondages, le pouvoir de l’alea.