10/07/2008

Book comment --2

(Sec 1.4):
Does the Bayesian analysis provide confidences or error bars?
-        Yes, I can show one plot in figure 4 with error bars.
 
It is more common for the variable of integration (d\theta) to be at the end of
the integral rather than first.
 
- Would be a lot of work, suggest not take this modification.
(Eqn 1.33):
 
If \gamma_j is the precision, should the inverse {\gamma_j}^{-1}, the variance,
appear in the Normal distribution in the equation.
 
-- Give me more time, I will respond soon.
 
More could be said about the Biology in the final paragraph in Section 1.4.3.
 
top of page 18:
spaces missing between A,B,C
 
(Eqn 1.47):
suggest move dX's after the function?
 
I'm confused by the text on page 17. Is C_{(s)} a column or ROW of C.
If it's a column, then doesn't C have K columns? and already denoted c_j?
C_(s)^T is the the s-th row of matrix and follows a Gaussian distribution.
Therefore, s = 1,…,p
Can \rho_s and \gamma be defined/introduced here, prior to their use in
diag(\rho_s,\gamma).
 
This section uses a lowercase k, whereas earlier section 1.2 began with
uppercase K, it would be good to be consistent.
 
(Sec 1.4.2.3, pg 21):
First line, mis-positioned bar should be above theta.
Eqn 1.75: small k
 
(Sec 1.4.2.4, first bullet):
Infer -> infer 
 
(Sec 1.4.2.5,  third bullet, equation):
What's Z'  (should there be a prime on the Z? (why Z'))
 
Change Z’ to Z (the symbol  in picture as well) 
 
Sec 1.5.1:
In section 1.5, you introduce a synthetic network with 54 players (illustrated in Fig 1.3), and generate expression levels for 10 mRNAs. When it comes to the evaluation and the construction of ROC curves, what is the ground truth to which you are comparing? 
12 edges:
A -> A, A -> B, A -> C,       , C -> D, C -> G, C -> K
D -> C, D -> E,       , F -> B, F -> D, G -> H, K -> J
 
 
Knowledge about the size of
the model that you are trying to recover (whilst acknowledging that there's a complex process actually generating the data) is useful, particularly its
topology. Possible structures on 10 nodes have between 0 and 100 edges.
Stating the network, gives an indication of how much information adding 
1, 2, or 3 prior interactions has.
 
Is it the network with the following 14 edges?
A -> A, A -> B, A -> C, B -> A, C -> D, C -> G, C -> K
D -> C, D -> E, E -> F, F -> B, F -> D, G -> H, K -> J
-No edges E -> F or B ->A
An image of this network would be a good figure to add.
I can draw it if you think necessary
 
 
How many edges are inferred by your methods? -i.e. if you had to choose a "best" point on the ROC curve? 
 
We tried to infer 12 edges, and performance is measured by AUC.
 
There are a number of related questions of interest, for
example, how many edges would you need to infer to get all the edges present in
the ground truth (mRNA 10 node) model correctly inferred?
 
Don’t test yet.

10/06/2008

Book Comment 1

page 26, "0 corresponds to the baseline (no prior information) model",
what is the baseline?
--Baseline is ROC-A, with u=0 (no prior knowledge), described in step 1.
Is it always 0.68 without any variation? More information on the baseline(s) must
be given.
And why don't you plot the AUCs without baseline adjustment?
--Not understand Where 0.68 comes from. I can show mean value and std_error in
one plot, rather than the 16-in-1 plot. If you want, I will send one to you.
page 26, "of of mu" <- "of mu"
page 27. what does "c" mean on the x-axis of Figure 1.4

c corresponds to the manually corrected arc, already in the text

 
Does the hidden state space dimensionality have any effect on the network
reconstruction accuracy? If it hasn't any influence,
then there seems to be no need for hidden states.
Why are 16 plots shown which all show exactly the same trends? The authors do not mention this finding in the text. They just mention that the "AUC performance can be seen to be linear with the number of prior arcs included" which is much less obvious from Figure 1.4.
It appears K has no influence for AUC. Should we show only one
plot with specific k and mu value, also variation will be plotted.
 
page 28, Figure 1.5, why are there some bars missing in the histograms in Figure 1.5?
It means the replicates needed to achieve
the specific AUC is larger than 16, beyond the test setting.
 
Was there a threshold imposed on the maximal replicates (e.g. 16)?
Yes, we explored replicates with range 1,2,4,8,16
 
Wouldn't it be more intuitive to set those bars to this maximum
instead of leaving them out,
as higher bars indicate worse performances in Figure 1.5?
Bars would be more crowded 
 
There is no interpretation of the results. E.g. it is not mentioned
whether "mu" and "k" have an
effect on the inference results. 

mu and k have no obvious effect on results, we focus on replicates numbers.

6/13/2008

Q&A in E_Coli code ---Continue

Ok, but in the script runS_post_as_prior_freq.m you get the posterior means from the previous network and set the hyprparameter precisions to 1 (why do you do this?)

PriMu.alpha = 1*ones(k,1); % precision vector, each element for a COL of A
PriMu.beta = 1*ones(pinp,1); % precision vector, each element for a COL of B
PriMu.gamma = 1*ones(k,1); % pseudo-precision vector, each element for a COL of C
PriMu.delta = 1*ones(pinp,1); ; % pseudo-precision vector, each element for a COL of D

==========================================

That's the default initial values for hyperpara, starting point for vbssm model. Tthis could be found at line 86 of vssminpn.m ( vbssm3.2 )


But then you retrain the model using the original data and these priors:

net = vssminpn(yn,inpn,k,its,0,0,PriMu,[10 20 30]);

But you didn't try giving the previous network as an input ? This should be possible, according to Matt README file:

i.e. net = vssminpn(yn,inpn,k,its,0,0,PriMu,[10 20 30],net);

===========================================

Yes, it's possible. I cannot remember why not use the previous network as input. Is it due to we have incorporated the previous posterior as prior?

In this case I wonder if we still need to specify the priors which do not change in PriMu{} as you have done in this script? Will the prior added in PriMu{} overwrite the existing posterior mean in net.exp.D ? (which is what we would want)

========================================

PriMu{} is changed by

muD = net.exp.D;
tmp = muD(:,2:end)';

for j = 1:length(from_idx)

tmp(from_idx(j),to_idx(j)) = double(conn_attr(j))*tmp(from_idx(j),to_idx(j))*mu;
end

muD(:,2:end) = tmp';
PriMu.muD = muD;


trigamma problem

It seems to work ok if I have a zero prior, so the problem with trigamma.m comes from introducing the non-zero prior. I have no idea how to fix this, so any help you can give would be much appreciated.

==================================================

I recalled I also had such problem for some seeds.


Instead of training model with seeds= (1, 10), I trained models with seeds = (1,30) and then collected 10 successful ones.


6/12/2008

Q&A in E_Coli code

1. why you transponse muD twice?
==============================
muD = net.exp.D; % muD's first column is biased column, row = 'to', col = 'from'
tmp = muD(:,2:end)'; % tmp = transpose of muD, such that in tmp row = 'from', col = 'to'

for j = 1:length(from_idx)

tmp(from_idx(j),to_idx(j)) = double(conn_attr(j))*tmp(from_idx(j),to_idx(j))*mu; % keep it row = 'from', col = 'to'
end

muD(:,2:end) = tmp'; % go back to muD's original format, i.e. row = 'to', col = 'from'
==============================

2. And are the row, column indices in from_idx, to_idx counted with the bias column included?
i.e in the E_coli example from_idx = 27. Does this mean gene 26 (with + 1 from bias column)?

from_idx, to_idx , conn_attr , all derived from the shift network prior:
% hns pp glpC 1
% hns pp glpQ 1
% hns pp cyoD 0
% hns pp cyoE 0
% hns pp cyoB 0
% hns pp cyoA 0
% hns pp sdhB -1
% hns pp arcA 0
% hns pp appY -1
% hns pp cadA -1
% hns pp cadB -1
% hns pp hdeB -1

hns is the 27th gene in common_genereg_names.mat, which is already in the E_Coli.zip, subdirectory 'step1-make_yn_inpn\matdata'.

common_genereg_names.mat is extracted from 'top50&reg.sif' and vsn-normalization.xls. Please refer this post (after Download) for detail.
.

Values in conn_attr.mat are 1 (positive), -1(negative) or 0 (none).


12/20/2007

AUC vesus Reps reps= [1 2 4 8 16]

auc_reps_mu_1_kk_1.pdf
auc_reps_all_in_1.pdf

Download Figures:
auc_reps_all_in_1.pdf and each separate figures in directory.

Note: Bars are grouped to show AT LEAST how many reps for x-arc is needed to reach some range of AUC value.
For example:
In the first subfigure (mu=1, kk =1), to reach range [0.625, 0.65], 0-arc at least reps =16 is needed, others only need reps =1.

Download AUC_data: arc0_mat; arc1_mat;arc2_mat;arc3_mat
In 1-arc test, auc value is stored by arc1_mat(mu_ind,kk_ind,reps_ind) = auc;
e.g. if mu = 2, kk =3, reps =8, the corresponding index should be arc1(2,3,4).

Setting:
murange = [1 2 4];
kkrange = [1:3];
reps = [1 2 4 8 16];

12/13/2007

AUC vesus Reps

Download:
http://www.cse.buffalo.edu/~juanli/auc_vs_reps/

p.s. mu_p5.pdf contains the legend, others don't have a legend.

Setting:
reps = [1 4 8];
murange = [0.1 0.5 1 2 4 8 12 16];
kkrange = [1:16];
seeds=[1:10];

AUC vesus Reps

Download:
http://www.cse.buffalo.edu/~juanli/auc_vs_reps/

p.s. mu_p5.pdf contains the legend, others don't have a legend.

Setting:
reps = [1 4 8];
murange = [0.1 0.5 1 2 4 8 12 16];
kkrange = [1:16];
seeds=[1:10];

8/30/2007

vsn & E_Coli calculation scripts

Download:
vsn script
E_Coli script
vbssm_v3.3.7

Both scripts are OK on my desktop, but I have no time to modify corresponding paths for your running, sorry.

readme.txt will be helpful, take a look.
-----------------------------------------------------------------------------------
actually what we should do is to run vsn_normalization data with the same priors - please could post your scripts for this ecoli calculation and the vsn_normalization calculation also. I 'll get my new student to look at them while you are away. Maybe we can talk when you return if we have any questions before you start work?

8/21/2007

E_Coli expr

Data is sightly changed on gene 'hns'!

The profile of gene 'hns' is modified to have constant of 0, -20, -100, respectively, after normalization.
---------------------------------------------------------------------------
This gene should have zero expression, which may mean that it should be constant and very low (negative) after normalization, rather than zero. -David

1. F vs kk
Hyper-opt is on_______ Hyper-opt is off
2. Add Posterior(vsn-normalization) as Prior(E_Coli.xls --'no-inter' sheet)

Part of Frequency Table for Shift Subset

Download: FreqTable(PDF): hns = 0, -20, -100
3-in-1 xls file: freq0-20-100.xls

8/16/2007

Prior script

http://www.cse.buffalo.edu/~juanli/Prior_Scripts.rar

norm script

Download:
1. norm_genes.m
2. example_genes.xls

norm_genes contains 3 parts:
1. read xls file
2. normlize data
3. generate input data for vbssm model.

Note:
1. 'example_genes.xls' is generated by extracting first 10 genes from 'E.coli.values.xls'--'no-inter' sheet.
2. There are 2 replicates timeseries , each of 8 timepoints
3. Data is already log-transformed

hns profiles after normalization


--From David------------------------
please could you post a figure of the profiles of gene hns after your normalization? this gene should have zero expression, which may mean that it should be constant and very low (negative) after normalization, rather than zero. we should check

8/03/2007

inter

Part of Frequency Table for Shift Subset
Notes:
1. Shift Subset is defined from Page 16 of Manchester.pdf

2. In muD (prior mean matrix of D), only 12 entries'signs are adjusted, based on the info in memo1. Only these 12 entries are multiplied with mu value (e.g. *0.5), when mu varies. Other entries inherit their sign and value from previous experiment.

3.
10 posterior MEAN matrices for A,B,C, D from the previous experiment (vsn_normalization), with some entries adjusted for the new priors. However, vbssm is unable to run when posterior COVARIANCE is incorporated, since 'trigamma' function will report severe problem to stop computation.
----------------------------------------------------------------------------------
Memo1:Priors
hns-> glpC, glpQ +ve (these appeared in the model network and were confirmed by the experiment)
hns-> cyo D,E,B,A no connection (not confirmed by experiment)
hns -> sdhB -ve (confirmed)
hns-> arcA no connection
hns-> appY -ve (connected confirmed but sign different)
hns-> cad A,B -ve (opposite sign)
hns -> hdeB -ve (opposite sign)
-------------------------------------------------------------------------------
Memo2. Posterior
Expt A, instead of starting with 10 random seeds, you need to start from the 10 posterior matrices for A,B,C, D from the previous experiment (vsn_normalization), with the means and variances adjusted for the new priors, i.e. the posteriors from the previous experiment become the priors for the new experiment.
-------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
Note:
Q: How to present 'no connection' in prior matrix?
A: This is be a prior constrained around zero - i.e mean zero but with very tight distribution (low variance)

8/01/2007

Set Prior Covariance Matrix for A,B,C,D

vbnet examples: mu = 0 (no prior), mu = 0.1 (with prior)

Download: ARD derivation notes

Data: Zak's data

reps = 4;

kk = 6;

arc = 9; (arc 9 is added as prior arc)

---------------------------------------------------------------------------------
Hi Juan

I guess so, how did you specify priors for the ARD prior experiments?
You need to set the mean and variance I guess, which would just be the diagonal of the full covariance matrix.

--
In ARD expr,

where delta = 1*ones(pinp,1) % by default initialization in Matt's code.

So, I only need to set the mean, let variance be default -Juan


I didn't realize the code output the full covaraiances - maybe we can look at the posterior covariances to understand correlation between the parameters as I suggested earlier - can you put some samples from the ARD prior experiments (Zak's data) on the web site?

- See top


Let's try to talk at 8am Friday if that works for you. If not then Thursday 8am would also work for me


- OK, Friday 8am.

David
----------------------------------------------------------------------------------
I happen to realize that, vbssm specifies the Prior Covariance matrix of D as diagonal, not full matrix.
(see the attached derivation Page 2, Equation 7)

But, I have checked that the Posterior Covariance matrix of D obtained from the previous experiment (vsn_normalization) is actually a full matrix. How should we treat it? Is it OK to let the non-diagonal entries be zero, in order to fit the vbssm model?

-Juan

F vs kk for "E_Coli_ no_inter_sheet"

Hyper-opt is on_______ Hyper-opt is off

Download: PDF
Data: E.coli.values.xls, no-inter sheet (normalized by Juan afterwards)
Hyper-optimization = on ( same as vsn-normalization.xls)
its = 2000.

Compared with Figure 5 of Zak's data (below), F_kk figure above seems make sense.
Hyper-opt is on_______ Hyper-opt is off

7/27/2007

vsn_normalization, including 'pnp' profiles


It's good that optimum k stays at 6, after pnp profiles added

including 'pnp' profiles

original

From David,
Experiment B) vsn_normalization

repeat the experiements run earlier (control and shift) but include the expression prifiles for pnp

you should probably rerun F vs K first but I wouldn't expect K to change

7/19/2007

Accumulative vs Non-Accumulative for Zak's Data

Non-Accumutive: reps = 8, kk = 4
Accumutive: reps = 8, kk = 4
Download:
Accumulative figures

Non-Accumulative figures

zak_accu_nonaccu.zip

Data: 'Zak's'.
reps = [1 4 8]; stdrange = [1.66 2.33 3]
kkrange = [1:16]; // According to Matt, we didn't explore optimal kk value for Zak's data
murange = [0 .1 .5 1 2 4 8 12 16 32 64 100];
seedrange = [1:10];

Results are the average of 10 models (seeds).

7/10/2007

Prior Test for "Shift Subset"

Part of Frequency Table for Shift Subset


Download: PriorTest_Shift_subset.pdf

Notes:
1. Shift Subst is defined from Page 16 of Manchester.pdf
2. Priors are defined from Page 15 of Manchester.pdf

4 verified interactions are incorporated as Priors for vbssm model:
hns pd appY
hns pd cadA
hns pd cadB
hns pd hdeB

The other 2 verified interactions do NOT belong to Shift Subset
arcA pd hybB
gutM pd srlR