Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
data [2014/08/10 20:06]
ychen
data [2014/08/10 20:57] (current)
ychen [DANN data]
Line 1: Line 1:
 ===== DANN data ===== ===== DANN data =====
  
-The raw data can be found here [[http://​krishna.gs.washington.edu/​martin/​download/​cadd_training/​]] ​+The raw data can be found here [[http://​krishna.gs.washington.edu/​martin/​download/​cadd_training/​]]. The real SNV, insertion and deletion samples sum up to 16,627,775. We randomly sample equal number of simutation samples (SNV, insertion and deletion), combine with the real data, and get a dataset of 33,255,550 samples.  
 + 
 +This dataset is transormed into svmlight format with script impute2svmlight.py,​ which is provided by Dr. Martin Kircher (the author of CADD paper), and the python package [[https://​github.com/​mblondel/​svmlight-loader|svmlight-loader]]. We roughly partition the dataset into 80% for training, 10% for validation and 10% for testing. Their svmlight files are here:  
  
  
You are here: startdata