I have questions about observations about COSMIC Mutation ID in GRCh37 version of COSMIC v72.
I wonder if there is a reliable way to validate COSMIC Mutation ID by using downloadable data at
I guess it could be an option to make a request to COSMIC website to check but I would like to avoid if possible.
COSMIC Mutation Ids in the downloadable data are not always searchable in the COSMIC website and seems inconsistent.
In looking at CosmicCompleteExport.tsv.gz and VCF/CosmicCodingMuts.vcf.gz, I am not sure how I could understand the followings:
- some COSM ids in VCF/CosmicCodingMuts.vcf.gz are not found in CosmicCompleteExport.tsv.gz
- some COSM ids found in both VCF/CosmicCodingMuts.vcf.gz and CosmicCompleteExport.tsv.gz are not found in website.
Example: COSM330384 is found in both files but not found in COSMIC website:
$ zcat cosmic/grch37/cosmic/v72/CosmicMutantExport.tsv.gz | grep -P "COSM330384\t"
SLC4A11_ENST00000380059 ENST00000380059 2757 SCC-9 2296303 2161906 upper_aerodigestive_tract head_neck carcinoma squamous_cell_carcinoma y COSM330384 c.77C>G p.P26R Substitution - Missense 37 20:3218634-3218634 - y PASSENGER/OTHER Reported in another cancer sample as somatic 25275298 cell-line NS 25
...(many records more)
$ zcat cosmic/grch37/cosmic/v72/VCF/CosmicCodingMuts.vcf.gz | grep -P "COSM330384\t"
20 3218634 COSM330384 G C . . GENE=SLC4A11_ENST00000380059;STRAND=-;SNP;GENE=SLC4A11_ENST00000380059;STRAND=-;CDS=c.77C>G;AA=p.P26R;CNT=10
Some variants have multiple IDs assigned:
$ zcat cosmic/grch37/cosmic/v72/VCF/CosmicCodingMuts.vcf.gz | grep -P "108175462\t"
11 108175462 COSM3736031 G A . . GENE=ATM_ENST00000278616;STRAND=+;SNP;GENE=ATM_ENST00000278616;STRAND=+;CDS=c.5557G>A;AA=p.D1853N;CNT=2
11 108175462 COSM41596 G A . . GENE=ATM;STRAND=+;SNP;GENE=ATM;STRAND=+;CDS=c.5557G>A;AA=p.D1853N;CNT=12
I would appreciate if you would give any advice.