Easy Way To Map Cds Coordinates To Genomic Coordinates
2
2
Entering edit mode
10.5 years ago
Alper Yilmaz ▴ 100

Suppose that I have a protein domain in my GeneA, and I know coordinates of the domain within the cds sequence of GeneA. In addition, I know the genomic coordinates of GeneA (eg, in GFF format) along with mRNA, exon coordinates.

Is there a easy way, to map the protein domain coordinates to genomic exon coordinates?

I looked into Bio::Coordinate::GeneMapper but was not able figure out it.

cds mapping gff bed coordinates • 5.2k views
3
Entering edit mode
10.5 years ago
brentp 23k

If you're not tied to perl, this is something that pygr does quite nicely.

E.g. this example or this one

Basically, you add an annotation to a sequence and then it keeps track of local and global positions and strand orientation.

There's an example here where they load data from a gff file. And there's a class specifically for protein annotations.

I believe the workflow would be something like:

1. add the proteins and the exons each in their own annotationDB
2. query for the protein to get a particular global location
3. use that global location to query to get the exonic coordinate.
0
Entering edit mode

could you show an example of how pygr does the transformation from genomic to codon space given an annotation of coding exons? I looked at the docs you linked to but it seems very sparse/opaque. there's no explanation of the translation class or an example of the global to local transformation so i'd be very interested to see a simple example.

0
Entering edit mode
20 months ago
Shicheng Guo ★ 8.8k

You can receive the full genomic position for all the conserved domains with jvarkit and then use bedtools to find domains for your specific gene.

git clone https://github.com/lindenb/jvarkit.git

cd jvarkit

wget  ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.xml.gz

wget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.xml.gz

java -jar ~/hpc/tools/jvarkit/dist/mapuniprot.jar  -R ~/hpc/db/hg19/hg19.fa  -u ~/hpc/uniprot_sprot.xml.gz -k knownGene.txt.gz -o uniprot_sprot.bed