This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
data [2014/08/10 20:29] ychen [DANN data] |
data [2014/08/10 20:31] ychen [DANN data] |
||
---|---|---|---|
Line 3: | Line 3: | ||
The raw data can be found here [[http://krishna.gs.washington.edu/martin/download/cadd_training/]]. The real SNV, insertion and deletion samples sum up to 16,627,775. We randomly sample equal number of simutation samples (SNV, insertion and deletion), combine with the real data, and get a dataset of 33,255,550 samples. | The raw data can be found here [[http://krishna.gs.washington.edu/martin/download/cadd_training/]]. The real SNV, insertion and deletion samples sum up to 16,627,775. We randomly sample equal number of simutation samples (SNV, insertion and deletion), combine with the real data, and get a dataset of 33,255,550 samples. | ||
- | This dataset is transormed into svmlight format with script impute2svmlight.py, which is provided by Dr. Martin Kircher (the author of CADD paper). Note that to do this, the python package [[svmlight-loader|https://github.com/mblondel/svmlight-loader]] is needed | + | This dataset is transormed into svmlight format with script impute2svmlight.py, which is provided by Dr. Martin Kircher (the author of CADD paper), and the python package [[https://github.com/mblondel/svmlight-loader|svmlight-loader]] is needed |