Question: Visualizing hg19 and hg38 FPKMs in a single plot
0
gravatar for komal.rathi
2.9 years ago by
komal.rathi3.4k
Children's Hospital of Philadelphia, Philadelphia, PA
komal.rathi3.4k wrote:

Hello everyone,

I am trying to plot NCAM1 gene expression from 5 disparate RNA sequencing datasets - all are processed in the same way (STAR -> RSEM) and quantified in terms of FPKM. The problem is that I have 4 datasets mapped to hg38 and one is mapped to hg19. These are the coordinates of NCAM1 in hg19 and in hg38 from UCSC Genome Browser:

hg19: chr11:112,831,969-113,149,158
hg38: chr11:112,961,436-113,275,489

Can I plot NCAM1 expression across these datasets (in one plot) even though they were mapped and quantified using different genome references and annotations?

rna-seq • 1.4k views
ADD COMMENTlink modified 2.9 years ago by seidel6.8k • written 2.9 years ago by komal.rathi3.4k
3
gravatar for Santosh Anand
2.9 years ago by
Santosh Anand4.9k
Santosh Anand4.9k wrote:

My guess is that from hg19 -> hg38, the only change will be in the coordinates, not in the gene structure and annotation per se. If I remember well, GENCODE does only the liftOver of cordiates from hg19 -> hg38. In that case you can safely plot the expressions in one plot. To be more sure, you can check if the gene structure from both annotations are same or not (It can be possible that they are quantifying different splice forms due to different annotations used.)

ADD COMMENTlink written 2.9 years ago by Santosh Anand4.9k

Thanks - so the hg38 based data used Gencode and hg19 used Refseq. I don't know if there is much difference in the gene models - looking at the UCSC genome browser, they appear to be just slightly different.

ADD REPLYlink written 2.9 years ago by komal.rathi3.4k

Gencode and RefSeq could differ by small amounts at ends, probably because GENCODE is curated for accurate gene structure (both 5' and 3' end). But if the overall gene structure is same, some bp here or there will not change the FPKM (Note also that FPKM is normalized for transcript length)

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Santosh Anand4.9k

Thanks for clarifying - can you move this to an answer so I can accept it?

ADD REPLYlink written 2.9 years ago by komal.rathi3.4k

can you move this to an answer so I can accept it?

How to do that?

ADD REPLYlink written 2.9 years ago by Santosh Anand4.9k

I'm not sure if you can move it, but I can so I did :p

ADD REPLYlink written 2.9 years ago by WouterDeCoster40k

Appreciate that very much, thank you :)

ADD REPLYlink written 2.9 years ago by Santosh Anand4.9k
2
gravatar for seidel
2.9 years ago by
seidel6.8k
United States
seidel6.8k wrote:

If you have FPKM values, then essentially the mapped reads have already been normalized to the appropriate gene structure, as Santosh notes in his comment "Note also that FPKM is normalized for transcript length". You can put them together in the same plot, but I would comment appropriately in the legend that one is from a different source. You still face the risk that the odd one is a different isoform, but unless you can figure this out explicitly the comment is your only safeguard.

ADD COMMENTlink written 2.9 years ago by seidel6.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1257 users visited in the last hour