Question: Rna-Seq Tophat-Cufflinks Pipeline Output - Questionable Isoforms
gravatar for renyangsu
6.7 years ago by
renyangsu0 wrote:

I have produced *.refmap and fpkm_tracking files using the Tophat2 > Cufflinks > Cuffcompare pipeline starting from RNA-seq fastq files and aligned with hg19. In some of the *.refmap files, I have ~230K total rows, and while I have the same number of total unique start/stop locus positions, I have genes with up to ~57K duplicates (using the table() function in R). I plotted the distribution of gene name duplicate counts (log transformed) for one of the samples Gene Name Duplicate Counts My question is, is this normal? My goal is to perform regression analysis using the FPKM values among the samples for consensus isoforms and using PCA and clusting analysis to determine population differentiation, but given that there are only around 22K refseq genes, I would like to know how to process these RNA-seq data.

rnaseq gtf fpkm tophat2 • 2.5k views
ADD COMMENTlink written 6.7 years ago by renyangsu0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1555 users visited in the last hour