Question: How to understand somatic mutation of ICGC data
gravatar for wangshx
2.1 years ago by
China/Shanghai/ShanghaiTech. University
wangshx10 wrote:

I am new to process somatic mutation by ICGC data. In simple_somatic_mutation.aggregated.vcf.gz(, I got vcf format file. Every Mutation ID in the data annotated how many donor affected. Is this the mutation number? When I wanted more detail data, only .tsv file provided. I am also confused about there are a number of same Mutation ID. I mean, in a sample, why there are more than one record at same chromosome loci?

For example,, this donor has a mutation ID MU28652212. It just affected one donor Across all Projects, while in .tsv file of project LUSC-CN, there are 5 rows of MU28652212. When I compute the mutation counts, should I treat it as 1 mutation or 5 mutation?

Please help.

somatic mutation icgc genome • 1.2k views
ADD COMMENTlink modified 2.1 years ago by solo777370 • written 2.1 years ago by wangshx10

This is because 5 transcripts are affected by mutation MU28652212. You need to prioritize a transcript out of 5. One way to do this is use maf2maf which will do this for you. You can use mafttols to convert ICGC simple somatic mutation format to MAF and further process them (apologies for shameless promotion)

ADD REPLYlink written 2.1 years ago by poisonAlien2.8k
gravatar for solo7773
2.1 years ago by
solo777370 wrote:

'Donor affected' means how many donors/patients carry this mutation.

ICGC mainly provides data in tabular format (tsv).

Duplicates of the same 'Mutation ID' exist because this mutation affects multiple genes/transcripts. With respect to your example, it should be 1 mutation. You can also refer to this doc for another example.

ADD COMMENTlink written 2.1 years ago by solo777370

Thanks! If I wanted to compute the mutation spectrum, should I merge the same rows of same mutation ID in a sample into 1 mutation?

ADD REPLYlink written 2.1 years ago by wangshx10

I think so. If you only care about the mutation within a sample, it's ok because duplicate IDs record the same mutated position, chromosome, reference allele, alter allele.

ADD REPLYlink written 2.1 years ago by solo777370
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1980 users visited in the last hour