I have downloaded ICGC donor id DO52673, It contains total nine files, FI675848, FI666508, FI385355, FI384962, FI384957, FI364287, FI364281, FI269878, FI269874.
In the ICGC site, they mentioned this donor contains 2,819 mutations but when I downloaded files of this donor id using ICGC-get in each file number is much high. Even I removed the duplication still number of mutations is high. In file name, its mentioned somatic snv and indel. File name : 8ddcf0d9-312f-4055-8984-55d463face34.svcp_1-0-4.20150127.somatic.snv_mnv File name: 8ddcf0d9-312f-4055-8984-55d463face34.broad-snowman.20151023.somatic.indel.vcf.gz
Is it somatic mutations or germline mutations? If Donor id DO52673 contains 2,819 somatic mutations then why their respective files contain more than 37000 mutations in each file?
Now I have 1290 donors and If I sum all the donors file mutations. It would be around 1.2E8. It seems these files contain germline data. I removed cosmic mutations, 1000 genome variants and SNPs. Still number of mutations are high. Anyone can help me to understand this data. Is it possible we have 1.2E8 somatic mutations in whole genome? Thank you