Get aminoacid sequence and position from genomic coordinates
0
0
Entering edit mode
5.8 years ago

Hi!

I have some coordinates bellow a given coverage from DNA seq experiments. Something like :

chr2 212578373  212578415

I would like to obtain the genomic sequence, the protein position with the exon information (although I think I solved this: https://doi.org/doi:10.18129/B9.bioc.ensembldb ) and the protein sequence.

Any ideas how to do it? Thanks! Joan

sequence • 1.0k views
ADD COMMENT
0
Entering edit mode

Hello Joan,

could you please describe to what data you have access? Do you have the reference fasta? Do you have an annotation file? Do you already know which gene and/or transcripts this regions overlap?

Depending on your answer there are several solutions.

fin swimmer

ADD REPLY
0
Entering edit mode

Hello Fin,

I have the reference fasta (GRCh37) and the annotation file (in this case refGene). Also, I know which gene corresponds the region but this could change between samples so I don't know if this should be strictly necessary.

Thanks for your help :) J

ADD REPLY
1
Entering edit mode

Hello Joan,

obtaining the genomic sequence is the easiest part. This can be done with bedtools:

$ bedtools getfasta -fi test.fa -bed test.bed

For the other things you asked for, it would be useful if you can provide an example of the desired output and how your annotation file looks like. As the protein sequence and exon informations depends on the transcript you might get multiple outputs.

fin swimmer

ADD REPLY

Login before adding your answer.

Traffic: 3277 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6