Question: RPKMs On Transcript Level
0
gravatar for scaspar
2.9 years ago by
scaspar0
scaspar0 wrote:

Hi guys,

I'm quite new to bioinformatics and I was wondering about the following.

Is there a way to obtain information about which transcripts are expressed in which tissues (including RPKM values)? I already obtained this information on gene level from GTEx portal (--> Download --> "Gene RPKM"), and I basically would wish for the same file structure, but on transcript level instead of gene level.

Does anybody know how to to that?

Thank you very much in advance and best regards.

PS: I already tried to download the file "Transcript RPKM" from GTEx, however it does not provide the right information, or maybe I just don't seem to understand it.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by scaspar0
1
gravatar for Amitm
2.9 years ago by
Amitm1.6k
UK
Amitm1.6k wrote:

hi,

As of V6 data release, the transcript isoform values are in this file - GTEx_Analysis_v6_RNA-seq_Flux1.6_transcript_rpkm.txt.gz That file on decompressing is ~15Gb.

If you do this -

$ head -n 1 GTEx_Analysis_v6_RNA-seq_Flux1.6_transcript_rpkm.txt |sed -e 's/\t/\n/g' |head
TargetID
Gene_Symbol
Chr
Coord 
GTEX-111CU-1826-SM-5GZYN
GTEX-111FC-0226-SM-5N9B8
GTEX-111VG-2326-SM-5N9BK
GTEX-111YS-2426-SM-5GZZQ
GTEX-1122O-2026-SM-5NQ91
GTEX-1128S-2126-SM-5H12U

$ head -n 1 GTEx_Analysis_v6_RNA-seq_Flux1.6_transcript_rpkm.txt |sed -e 's/\t/\n/g' |wc -l
8559
$

Then that means there are 8559 cols in the file. And the snippet here on GTEx site about V6 data release says this -

2015-10-19 V6 Data Released The GTEx Portal has been updated to data release V6 (dbGaP accession phs000424.v6.p1). In this release the number of genotyped donors has increased to 450 and the number of RNA-seq samples to 8555 across 51 tissue sites and 2 cell lines, giving sufficient power to detect eQTLs in 44 tissues. Full gene and isoform expression datasets are available for download. Genotypes and RNA-seq bam files are available via dbGaP.

So, it all adds up. Hopefully this helps

ADD COMMENTlink written 2.9 years ago by Amitm1.6k
0
gravatar for scaspar
2.9 years ago by
scaspar0
scaspar0 wrote:

Hi,

Thank you very much for your response!

You're absolutely right, it does add up, so the information I'm looking for is basically contained in this file. Unfortunately in this file, the information is not sorted neatly, so I cannot work with it in its current form. What I would need is the same format as the following:

Transcripts - Genomic Coordinate - Tissue 1 - Tissue 2 - Tissue 3...

Transcript 1 - Genomic Coordinate 1 - RPKM - RPKM - RPKM

Transcript 2 - Genomic Coordinate 2 - RPKM - RPKM - RPKM

...

To convert this file, I guess I need a python script? Unfortunately I never used one before, is there another method to try a conversion?

Thanks and best regards

ADD COMMENTlink written 2.9 years ago by scaspar0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1019 users visited in the last hour