Rna-Seq Tophat-Cufflinks Pipeline Output - Questionable Isoforms
0
0
Entering edit mode
10.1 years ago
renyangsu • 0

I have produced *.refmap and fpkm_tracking files using the Tophat2 > Cufflinks > Cuffcompare pipeline starting from RNA-seq fastq files and aligned with hg19. In some of the *.refmap files, I have ~230K total rows, and while I have the same number of total unique start/stop locus positions, I have genes with up to ~57K duplicates (using the table() function in R). I plotted the distribution of gene name duplicate counts (log transformed) for one of the samples Gene Name Duplicate Counts My question is, is this normal? My goal is to perform regression analysis using the FPKM values among the samples for consensus isoforms and using PCA and clusting analysis to determine population differentiation, but given that there are only around 22K refseq genes, I would like to know how to process these RNA-seq data.

rnaseq tophat2 gtf fpkm • 3.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 2255 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6