Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
data [2014/08/10 20:03]
ychen
data [2014/08/10 20:57] (current)
ychen [DANN data]
Line 1: Line 1:
 +===== DANN data =====
 +
 +The raw data can be found here [[http://​krishna.gs.washington.edu/​martin/​download/​cadd_training/​]]. The real SNV, insertion and deletion samples sum up to 16,627,775. We randomly sample equal number of simutation samples (SNV, insertion and deletion), combine with the real data, and get a dataset of 33,255,550 samples. ​
 +
 +This dataset is transormed into svmlight format with script impute2svmlight.py,​ which is provided by Dr. Martin Kircher (the author of CADD paper), and the python package [[https://​github.com/​mblondel/​svmlight-loader|svmlight-loader]]. We roughly partition the dataset into 80% for training, 10% for validation and 10% for testing. Their svmlight files are here: 
 +
 +
 +
 ===== tree-hmm sample .bam's from chr19 ===== ===== tree-hmm sample .bam's from chr19 =====
  
You are here: startdata