BioSSM

8/30/2007

vsn & E_Coli calculation scripts

Download:
vsn script
E_Coli script
vbssm_v3.3.7

Both scripts are OK on my desktop, but I have no time to modify corresponding paths for your running, sorry.

readme.txt will be helpful, take a look.
-----------------------------------------------------------------------------------
actually what we should do is to run vsn_normalization data with the same priors - please could post your scripts for this ecoli calculation and the vsn_normalization calculation also. I 'll get my new student to look at them while you are away. Maybe we can talk when you return if we have any questions before you start work?

8/21/2007

E_Coli expr

Data is sightly changed on gene 'hns'!

The profile of gene 'hns' is modified to have constant of 0, -20, -100, respectively, after normalization.
---------------------------------------------------------------------------
This gene should have zero expression, which may mean that it should be constant and very low (negative) after normalization, rather than zero. -David

1. F vs kk

Hyper-opt is on_______ Hyper-opt is off

2. Add Posterior(vsn-normalization) as Prior(E_Coli.xls --'no-inter' sheet)

Part of Frequency Table for Shift Subset

Download: FreqTable(PDF): hns = 0, -20, -100
3-in-1 xls file: freq0-20-100.xls

8/16/2007

Prior script

http://www.cse.buffalo.edu/~juanli/Prior_Scripts.rar

norm script

Download:
1. norm_genes.m
2. example_genes.xls

norm_genes contains 3 parts:
1. read xls file
2. normlize data
3. generate input data for vbssm model.

Note:
1. 'example_genes.xls' is generated by extracting first 10 genes from 'E.coli.values.xls'--'no-inter' sheet.
2. There are 2 replicates timeseries , each of 8 timepoints
3. Data is already log-transformed

hns profiles after normalization

--From David------------------------
please could you post a figure of the profiles of gene hns after your normalization? this gene should have zero expression, which may mean that it should be constant and very low (negative) after normalization, rather than zero. we should check

8/03/2007

inter

Part of Frequency Table for Shift Subset

Download: freq_EColi_post_as_prior.pdf

Notes:
1. Shift Subset is defined from Page 16 of Manchester.pdf

2. In muD (prior mean matrix of D), only 12 entries'signs are adjusted, based on the info in memo1. Only these 12 entries are multiplied with mu value (e.g. *0.5), when mu varies. Other entries inherit their sign and value from previous experiment.

3. 10 posterior MEAN matrices for A,B,C, D from the previous experiment (vsn_normalization), with some entries adjusted for the new priors. However, vbssm is unable to run when posterior COVARIANCE is incorporated, since 'trigamma' function will report severe problem to stop computation.
----------------------------------------------------------------------------------
Memo1:Priors 
hns-> glpC, glpQ +ve (these appeared in the model network and were confirmed by the experiment)
hns-> cyo D,E,B,A no connection (not confirmed by experiment)
hns -> sdhB -ve (confirmed)
hns-> arcA no connection
hns-> appY -ve (connected confirmed but sign different)
hns-> cad A,B -ve (opposite sign)
hns -> hdeB -ve (opposite sign)
-------------------------------------------------------------------------------
Memo2. Posterior
Expt A, instead of starting with 10 random seeds, you need to start from the 10 posterior matrices for A,B,C, D from the previous experiment (vsn_normalization), with the means and variances adjusted for the new priors, i.e. the posteriors from the previous experiment become the priors for the new experiment.
-------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
Note:
Q: How to present 'no connection' in prior matrix?
A: This is be a prior constrained around zero - i.e mean zero but with very tight distribution (low variance)

8/01/2007

Set Prior Covariance Matrix for A,B,C,D

vbnet examples: mu = 0 (no prior), mu = 0.1 (with prior)

Download: ARD derivation notes

Data: Zak's data
reps = 4;
kk = 6;
arc = 9; (arc 9 is added as prior arc)
---------------------------------------------------------------------------------
Hi Juan

I guess so, how did you specify priors for the ARD prior experiments?
You need to set the mean and variance I guess, which would just be the diagonal of the full covariance matrix.
-- In ARD expr,

where delta = 1*ones(pinp,1) % by default initialization in Matt's code.

So, I only need to set the mean, let variance be default -Juan

I didn't realize the code output the full covaraiances - maybe we can look at the posterior covariances to understand correlation between the parameters as I suggested earlier - can you put some samples from the ARD prior experiments (Zak's data) on the web site?
- See top

Let's try to talk at 8am Friday if that works for you. If not then Thursday 8am would also work for me

- OK, Friday 8am.

David
----------------------------------------------------------------------------------
I happen to realize that, vbssm specifies the Prior Covariance matrix of D as diagonal, not full matrix.
(see the attached derivation Page 2, Equation 7)

But, I have checked that the Posterior Covariance matrix of D obtained from the previous experiment (vsn_normalization) is actually a full matrix. How should we treat it? Is it OK to let the non-diagonal entries be zero, in order to fit the vbssm model?

-Juan

F vs kk for "E_Coli_ no_inter_sheet"

Hyper-opt is on_______ Hyper-opt is off

Download: PDF
Data: E.coli.values.xls, no-inter sheet (normalized by Juan afterwards)
Hyper-optimization = on ( same as vsn-normalization.xls)
its = 2000.

Compared with Figure 5 of Zak's data (below), F_kk figure above seems make sense.

Hyper-opt is on_______ Hyper-opt is off

7/27/2007

vsn_normalization, including 'pnp' profiles

It's good that optimum k stays at 6, after pnp profiles added

including 'pnp' profiles

original


From David,
Experiment B) vsn_normalization

repeat the experiements run earlier (control and shift) but include the expression prifiles for pnp

you should probably rerun F vs K first but I wouldn't expect K to change

7/19/2007

Accumulative vs Non-Accumulative for Zak's Data

Non-Accumutive: reps = 8, kk = 4

Accumutive: reps = 8, kk = 4

Download:
Accumulative figures

Non-Accumulative figures

zak_accu_nonaccu.zip

Data: 'Zak's'.
reps = [1 4 8]; stdrange = [1.66 2.33 3]
kkrange = [1:16]; // According to Matt, we didn't explore optimal kk value for Zak's data
murange = [0 .1 .5 1 2 4 8 12 16 32 64 100];
seedrange = [1:10];

Results are the average of 10 models (seeds).

7/10/2007

Prior Test for "Shift Subset"

Part of Frequency Table for Shift Subset

Download: PriorTest_Shift_subset.pdf

Notes:
1. Shift Subst is defined from Page 16 of Manchester.pdf
2. Priors are defined from Page 15 of Manchester.pdf

4 verified interactions are incorporated as Priors for vbssm model:

hns pd appY
hns pd cadA
hns pd cadB
hns pd hdeB

The other 2 verified interactions do NOT belong to Shift Subset

arcA pd hybB
gutM pd srlR

6/21/2007

Aucroc T = 6 figure

Download: roc_reps4_somek_ho0_allT.pdf

Download: meanaucreps2-4-8-16_ho0_allT.pdf
reps = [2 4 8 16], T = [6 12 120]

Download: aucroc_reps4_k1-16_ho0_allT.pdf
reps = 4, T = [6 12 120]

6/04/2007

Zak data, scripts and our simulated data sets

Download: Zak_Data.zip

Note: Folder "Matt_profiles" contains the script Matt wrote to generate the plot of the noisy versions of the mRNAs.
i.e. 'MA','MB','MC','MD','ME','MF','MG','MH','MJ','MK'

6/02/2007

Accumulative vs Non-Accumulative Priors

Accumulative: for n-th test, block[1:n] are added as prior. Z = 3

Non-Accumulative: for n-th test, only block[n] are added as prior. Z = 3

Download: (Average of 10 seeds)
1. Accumulative results: Z = 1.65, Z = 2.33, Z= 3
2. Non-Accumulative results: Z = 1.65, Z = 2.33, Z = 3
3.Sorted "top50 & reg" as blocks ( PDF / XLS), gray indicated no entry in vsn-normalization.xls

Data: Control network of vsn_normalization.xls
Setting:
k = 6 (optimum);
seedrange = [1:10];
murange = [0 .1 .5 1 2 4 8 12 16 32 64 100];

5/20/2007

Subset recovery for Shift -- Freqency Show

Numbers on the edges represent the number of models from 10 different random seeds
Download: pdf / .sif (with freqency on right)

Data: Shift network of vsn_normalization.xls
Setting: 10 seeds, k = 6 (optimum), mu = 0 (no prior incorporated)

Download: Script
Readme:
1. Run script to get the file containing hit log
Script to Run: "runS_subset_prior_freq.m"

Input:
S_yn.mat
S_inpn.mat
S_inpn.mat
prior_inter.mat

Dependencies:
find_arcs.m

Output:
hitlog/hit_subset

2. Analyze the hit log file
Script to Run : "analyze_hitlog_freq.m"

Input:
hitlog/hit_subset

Output:
freq_Prior_subset_Shift_vsn_top50reg.txt

Format:
from pd to mu= 0 .1 .5 1 2 4 8 12 16 32 64 100

cadB pd cadA 5 9 4 4 5 7 4 2 5 7 4 6

The number behind "cadB pd cadA" indicated this edge occurances among 10 different random seeds models, for different mu.

5/10/2007

'top50 & reg' 'Block-Prior' Freqency Show

Best Case: mu = 0.5 (PDF)

Data: vsn-normalization.xls 'Control' Data

Setting:
murange = [0 .1 .5 1 2 4 8 12 16 32 64 100]; Download all figures while mu varies
10 seeds for vbssm model training

X-label: true arc index
Y-label: frequency of the arc recovered in 10 seeds training

Notes:
1) These figures indicate mu=0.5 are optimal
2) tdcA-arcs are more easily recovered then tdcR-arcs

*The frequency analysis is *AGREE with the previous average analysis.

5/03/2007

'top50 & reg' 'Block-Prior' Average Show

Download: PDF/JPG

Data: vsn-normalization.xls 'Control' Data

Setting:
murange = [0 .1 .5 1 2 4 8 12 16 32 64 100];
10 seeds for vbssm model training

Procedures:
1. Sort 'top50 & reg' as blocks: pdf / xls
2. In k-th experiment, incoporate blocks[1:k] as prior, take the average of 10 seeds for significance computation
3. Analyze the recovered true arcs, Total # = 10.
They are grouped into 2 catogaries: tdcA-arcs & tdcR-arcs, where:

tdcA-arcs: (color is agree with legend)

tdcA pd tdcB

tdcA pd tdcC
tdcA pd tdcD
tdcA pd tdcE
tdcA pp tdcA

tdcR-arcs: (color is agree with legend)

tdcR pp tdcA

tdcR pp tdcB
tdcR pp tdcC
tdcR pp tdcE
tdcR pp tdcD

More details about this work
1. Sort the 'top50 & reg' as blocks.
That means, arcs with same 'from-gene' are grouped as one block, ignore the 'to-gene'.
E.g. The group containing all tdcA-arcs are named block-1, the group containing all tdcR-arcs are named block-2.
block-idx from to

1	tdcA	tdcA
1	tdcA	tdcB
1	tdcA	tdcD
1	tdcA	tdcE
1	tdcA	tdcF
1	tdcA	tdcG
1	tdcA	tdcC

2	tdcR	tdcA
2	tdcR	tdcB
2	tdcR	tdcC
2	tdcR	tdcE
2	tdcR	tdcF
2	tdcR	tdcG
2	tdcR	tdcD

>From block-3, block-index is numbered by the alphabet order of 'from-genes'

2. Add prior for vbssm training.
There are 29 experiment, since the block # I sort in 'top50®' is exactly 29. Each experiment also explores mu range.
1st test, add all tdcA-arcs (i.e. block-1) as prior
2nd test, keep the tdcA-arcs (block-1) as prior, meanwhile add tdcR-arcs (block-2) as prior.
....
29th test, all arcs (i.e. block(1:29) ) are added as priors.

3. Organize the 29 experiment data, each murange = [0 .1 .5 1 2 4 8 12 16 32 64 100]
Since mu value play an important role in recovery evaluation, I made the figure to reflect this point.

There is no true network at hand, only 10 known arcs. I named them as 2 groups
tdcA-arcs:(from gene = tdcA, to gene = don't care)
tdcA pd tdcB
tdcA pd tdcC
tdcA pd tdcD
tdcA pd tdcE
tdcA pp tdcA

tdcR-arcs:(from gene = tdcR, to gene = don't care)
tdcR pp tdcA
tdcR pp tdcB
tdcR pp tdcC
tdcR pp tdcE
tdcR pp tdcD

Different colors, blue and pink, to demonstrate 10 true arcs recovery results. Blue = tdcA-arcs, pink = tdcR-arcs.
The bule+pink stack gives the total number of vbssm identified true arcs.

Based on my understanding, figure revealed at least 2 information:
1. mu = 0.5 is optimum mu value, at which # of recovered arcs reaches peak.
2. In global view, pink-arcs only showed with some special mu, whileas, blue-arcs are not significantly affected by mu value.

4/04/2007

ARD Net

figure parameter: 3-arc-idx = 9 , reps=8 , kk = 15
seedrange = [1:10]; murange = [.1 .5 1 2 4 8 12 16];

vbssm output for 3-arc-idx = 9, kkrange = [1:16] is at

'/home/csgrad/juanli/work/log/triplet_128'

Net Name Explanation:
'net_2_arc9_kk15_seed1.mat' means
3-arc-idx = 9, kk = 15, seed = 1, mu = 2;

*p1 = 0.1; p5 = 0.5

--------------------------------------------------------------------------------------------
Juan

I need the vbssm output to be able to look at possible correlations in the CBD matrix. Please can you generate this in the first instance for the model with arc =3, reps=8 and kk = 15 say, the same values that were in the figure 3arc_info_auc that you sent me. Ideally, I would like all models k = 1:16 - these could be in separate .mat files to reduce storage space.

David
---------------------------------------------------------------------------------------------

3/30/2007

Explore mu range on "Contro " of vsn-normalizaion

murange = [0 .1 .5 1 2 4 8 12 16 32 64 100];
10 seeds for vbssm model training

Recovered Result is shown on PDF / XLS

3/26/2007

ARD Data

1-arc: reps = 1 / 4 / 8
2-arc: reps = 1 / 4 / 8
3-arc: reps = 1 / 4 / 8

Each includes matrice trained from10 seeds.

The data structure is organized as:
Sample | reps | arc_ind | kk | mu | auc-val

pair_map.mat gives the pair_index, and triplet_map gives the triplet_index, mapped from 12 single arcs.

see also: ARD Experiment, ARD scipts, ARD results

8/30/2007

8/21/2007

8/16/2007

8/03/2007

8/01/2007

7/27/2007

7/19/2007

7/10/2007

6/21/2007

6/04/2007

6/02/2007

5/20/2007

5/10/2007

5/03/2007

4/04/2007

3/30/2007

3/26/2007

Blog Archive