Question: Mapping between genomic coordinates to transcriptomic coordinates
0
gravatar for MMa
3.4 years ago by
MMa280
MMa280 wrote:

Suppose I have a list of genomic positions and a reference GTF/ GFF3. Are there any software tools (any language is fine) that maps the position to the cDNA/ CDS position in relate to any transcript(s) that it may overlap?

For example, based on hg38 and GENCODE version 25, I would expect an input of chr1:151406413 would give the following as output (using CDS coordinates): ENST00000271715: 2622, ENST00000368863: 2337, ENST00000392723: 2463, ENST00000409503: 2595, ENST00000491586: 2490, ENST00000529669: 822 and ENST00000531094: 2436.

I know I can just throw it into ANNOVAR or VEP; but I wonder if there's a simpler way without actually predicting the consequences.

Thanks anyone for the answer in advance!

snp gff gtf vcf • 2.2k views
ADD COMMENTlink modified 3.3 years ago by cmdcolin1.4k • written 3.4 years ago by MMa280
2
gravatar for wuchangcheng66
3.4 years ago by
wuchangcheng6630 wrote:

In R, the GenomicFeatures package could be one choice.

ADD COMMENTlink written 3.4 years ago by wuchangcheng6630
1

Eventually I got to try out this one. The important GenomicFeatures function here is mapToTranscripts:

library(GenomicFeatures)
library(rtracklayer)
gencodeTxDb <- makeTxDbFromGFF (file="gencode.annotation.gtf")
gencodeTx <- transcripts (gencodeTxDb)
names (gencodeTx) <- id2name (gencodeTxDb, "tx")
genomicLocus <- import.bed (con="genomicLocus.BED")
mappedLocus <- mapToTranscripts (x=genomicLocus, transcripts=gencodeTx)

The output is an GRanges object that is easy enough to work with.

ADD REPLYlink written 2.8 years ago by MMa280
1
gravatar for MMa
3.3 years ago by
MMa280
MMa280 wrote:

I got myself an answer today: for Python, we may use transVar.

ADD COMMENTlink written 3.3 years ago by MMa280
0
gravatar for Alex Reynolds
3.3 years ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

On the command line, one option is to download annotations and convert them to BED via convert2bed:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/gencode.v25.annotation.gff3.gz \
    | gunzip -c- \
    | convert2bed -i gff - \
    > annotations.bed

If you just want transcripts, you could filter the result with awk:

$ awk '$8=="transcript"' annotations.bed > transcripts.bed

Convert your input to a sorted BED file with awk and sort-bed:

$ awk FS=":" '{ print $1"\t"$2"\t"($2 + 1); }' input.txt | sort-bed - > input.bed

Then query the input.bed file against annotations.bed or transcripts.bed with bedmap:

$ bedmap --echo --echo-map-id --delim '\t' input.bed transcripts.bed > answer.bed

Once you have made annotations.bed or transcripts.bed, you can repeat queries with any other input.bed file you generate down the road.

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by Alex Reynolds30k
0
gravatar for cmdcolin
3.3 years ago by
cmdcolin1.4k
United States
cmdcolin1.4k wrote:

You could possibly use HGVS mutalyzer tools https://mutalyzer.nl/position-converter

Not sure if it explicitely uses GENCODE

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by cmdcolin1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 727 users visited in the last hour