Question: COSMIC to Ensembl mapping
0
gravatar for banerjeeshayantan
2.0 years ago by
banerjeeshayantan170 wrote:

I have downloaded the COSMIC mutation file based on GRCH38. I have the cosmic mutation ids for each mutation (eg, COSM521,COSM520 etc). If I copy these ids and check in the search box of the website I get all the related information such as its emsembl contig etc. Using these ENSEMBL contigs, I visit the ENSEMBL database and extract the sequence associated with this variant. Is there any way to extract all the cosmic variant sequences from the ENSEMBL database without doing this individually for all? In other words, how to map the COSMIC variants with that of the ENSEMBL ones?

ADD COMMENTlink modified 2.0 years ago by Emily_Ensembl20k • written 2.0 years ago by banerjeeshayantan170

Hello,

it is unclear to me what you mean by

Using these ENSEMBL contigs, I visit the ENSEMBL database and extract the sequence associated with this variant.

Do you mean how the sequence change due to the variant e.g for COSM521 A>G? Isn't this information in the file you've downloaded?

fin swimmer

ADD REPLYlink written 2.0 years ago by finswimmer13k

Sorry for the confusion. I meant flanking sequence containing the variant position

ADD REPLYlink written 2.0 years ago by banerjeeshayantan170

What do you mean by "sequence"? The flanking region? Just the base-pair change?

ADD REPLYlink written 2.0 years ago by Emily_Ensembl20k

The flanking region containing the variant

ADD REPLYlink written 2.0 years ago by banerjeeshayantan170

if you have reference sequence (in this case GRCh38), get flank in bedtools (https://bedtools.readthedocs.io/en/latest/content/tools/flank.html) will give flank ranges and using getfasta, get the flanking sequences from above created ranges (https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html)

ADD REPLYlink written 2.0 years ago by cpad011213k

Thanks for your reply. So my question is do I need to incorporate variant information in the reference sequence? What if the flanking region for a SNP is part of a INDEL. If I don't incorporate the INDEL variant into the refseq, wouldn't I lose information? Or should I just use the refseq as it is?

ADD REPLYlink written 2.0 years ago by banerjeeshayantan170
3
gravatar for Emily_Ensembl
2.0 years ago by
Emily_Ensembl20k
EMBL-EBI
Emily_Ensembl20k wrote:

You can get the flanking regions for lists of variants in Ensembl using either BioMart or the Perl API.

BioMart is more suited to short lists of variants. There's a help video on using BioMart here. Use the somatic short variation database then filter by your list of IDs, get flanking sequence as attributes – you can specify how large a flank you need.

Alternatively, you can use the Perl API, which has methods in the Variation module to get the 5' and 3' flanking sequence for a variant.

Let me know if you need any help using either of these.

ADD COMMENTlink written 2.0 years ago by Emily_Ensembl20k

Thanks for the suggestion. Will surely check it out

ADD REPLYlink written 2.0 years ago by banerjeeshayantan170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1237 users visited in the last hour