Differences

This shows you the differences between two versions of the page.

--- data [2014/08/10 20:30]
ychen
+++ data [2014/08/10 20:33]
ychen [DANN data]
@@ Line 3: / Line 3: @@
 The raw data can be found here [[http://krishna.gs.washington.edu/martin/download/cadd_training/]]. The real SNV, insertion and deletion samples sum up to 16,627,775. We randomly sample equal number of simutation samples (SNV, insertion and deletion), combine with the real data, and get a dataset of 33,255,550 samples.
-This dataset is transormed into svmlight format with script impute2svmlight.py, which is provided by Dr. Martin Kircher (the author of CADD paper). Note that to do this, the python package [[https://github.com/mblondel/svmlight-loader|svmlight-loader]] is needed
+This dataset is transormed into svmlight format with script impute2svmlight.py, which is provided by Dr. Martin Kircher (the author of CADD paper), and the python package [[https://github.com/mblondel/svmlight-loader|svmlight-loader]]. We partition the dataset into