Mapping between genomic coordinates to transcriptomic coordinates
4
0
Entering edit mode
7.0 years ago
John Ma ▴ 310

Suppose I have a list of genomic positions and a reference GTF/ GFF3. Are there any software tools (any language is fine) that maps the position to the cDNA/ CDS position in relate to any transcript(s) that it may overlap?

For example, based on hg38 and GENCODE version 25, I would expect an input of chr1:151406413 would give the following as output (using CDS coordinates): ENST00000271715: 2622, ENST00000368863: 2337, ENST00000392723: 2463, ENST00000409503: 2595, ENST00000491586: 2490, ENST00000529669: 822 and ENST00000531094: 2436.

I know I can just throw it into ANNOVAR or VEP; but I wonder if there's a simpler way without actually predicting the consequences.

Thanks anyone for the answer in advance!

vcf gff gtf SNP • 4.8k views
ADD COMMENT
2
Entering edit mode
7.0 years ago

In R, the GenomicFeatures package could be one choice.

ADD COMMENT
1
Entering edit mode

Eventually I got to try out this one. The important GenomicFeatures function here is mapToTranscripts:

library(GenomicFeatures)
library(rtracklayer)
gencodeTxDb <- makeTxDbFromGFF (file="gencode.annotation.gtf")
gencodeTx <- transcripts (gencodeTxDb)
names (gencodeTx) <- id2name (gencodeTxDb, "tx")
genomicLocus <- import.bed (con="genomicLocus.BED")
mappedLocus <- mapToTranscripts (x=genomicLocus, transcripts=gencodeTx)

The output is an GRanges object that is easy enough to work with.

ADD REPLY
1
Entering edit mode
6.9 years ago
John Ma ▴ 310

I got myself an answer today: for Python, we may use transVar.

ADD COMMENT
0
Entering edit mode
6.9 years ago

On the command line, one option is to download annotations and convert them to BED via convert2bed:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/gencode.v25.annotation.gff3.gz \
    | gunzip -c- \
    | convert2bed -i gff - \
    > annotations.bed

If you just want transcripts, you could filter the result with awk:

$ awk '$8=="transcript"' annotations.bed > transcripts.bed

Convert your input to a sorted BED file with awk and sort-bed:

$ awk FS=":" '{ print $1"\t"$2"\t"($2 + 1); }' input.txt | sort-bed - > input.bed

Then query the input.bed file against annotations.bed or transcripts.bed with bedmap:

$ bedmap --echo --echo-map-id --delim '\t' input.bed transcripts.bed > answer.bed

Once you have made annotations.bed or transcripts.bed, you can repeat queries with any other input.bed file you generate down the road.

ADD COMMENT
0
Entering edit mode
6.9 years ago
cmdcolin ★ 3.8k

You could possibly use HGVS mutalyzer tools https://mutalyzer.nl/position-converter

Not sure if it explicitely uses GENCODE

ADD COMMENT

Login before adding your answer.

Traffic: 2559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6