RPKMs On Transcript Level
2
0
Entering edit mode
8.1 years ago
scaspar • 0

Hi guys,

I'm quite new to bioinformatics and I was wondering about the following.

Is there a way to obtain information about which transcripts are expressed in which tissues (including RPKM values)? I already obtained this information on gene level from GTEx portal (--> Download --> "Gene RPKM"), and I basically would wish for the same file structure, but on transcript level instead of gene level.

Does anybody know how to to that?

Thank you very much in advance and best regards.

PS: I already tried to download the file "Transcript RPKM" from GTEx, however it does not provide the right information, or maybe I just don't seem to understand it.

Transcript RPKM GTEx eQTL Expression • 2.0k views
ADD COMMENT
1
Entering edit mode
8.1 years ago
Amitm ★ 2.3k

hi,

As of V6 data release, the transcript isoform values are in this file - GTEx_Analysis_v6_RNA-seq_Flux1.6_transcript_rpkm.txt.gz That file on decompressing is ~15Gb.

If you do this -

$ head -n 1 GTEx_Analysis_v6_RNA-seq_Flux1.6_transcript_rpkm.txt |sed -e 's/\t/\n/g' |head
TargetID
Gene_Symbol
Chr
Coord 
GTEX-111CU-1826-SM-5GZYN
GTEX-111FC-0226-SM-5N9B8
GTEX-111VG-2326-SM-5N9BK
GTEX-111YS-2426-SM-5GZZQ
GTEX-1122O-2026-SM-5NQ91
GTEX-1128S-2126-SM-5H12U

$ head -n 1 GTEx_Analysis_v6_RNA-seq_Flux1.6_transcript_rpkm.txt |sed -e 's/\t/\n/g' |wc -l
8559
$

Then that means there are 8559 cols in the file. And the snippet here on GTEx site about V6 data release says this -

2015-10-19 V6 Data Released The GTEx Portal has been updated to data release V6 (dbGaP accession phs000424.v6.p1). In this release the number of genotyped donors has increased to 450 and the number of RNA-seq samples to 8555 across 51 tissue sites and 2 cell lines, giving sufficient power to detect eQTLs in 44 tissues. Full gene and isoform expression datasets are available for download. Genotypes and RNA-seq bam files are available via dbGaP.

So, it all adds up. Hopefully this helps

ADD COMMENT
0
Entering edit mode
8.1 years ago
scaspar • 0

Hi,

Thank you very much for your response!

You're absolutely right, it does add up, so the information I'm looking for is basically contained in this file. Unfortunately in this file, the information is not sorted neatly, so I cannot work with it in its current form. What I would need is the same format as the following:

Transcripts - Genomic Coordinate - Tissue 1 - Tissue 2 - Tissue 3...

Transcript 1 - Genomic Coordinate 1 - RPKM - RPKM - RPKM

Transcript 2 - Genomic Coordinate 2 - RPKM - RPKM - RPKM

...

To convert this file, I guess I need a python script? Unfortunately I never used one before, is there another method to try a conversion?

Thanks and best regards

ADD COMMENT

Login before adding your answer.

Traffic: 796 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6