Question: Strand Specific Tophat Output
5.9 years ago by
GPR310 wrote:

Hello, I am having a hard time coming about the possible bad quality of a data-set I just got. The library is paired-end and strand-specific (first strand), and so, I have aligned with TopHat, including the option "--library-type=fr-strand-specific.

My problem is that after the Cufflinks/Cuffdiff protocol, I look at the output files and find that most of the gene/transcript IDs have a bad "status", either "fail" or "no test" I have ran Cufflinks and Cuffdiff with and without the --library-type option and got the same poor results. This is surprising especially because I have about 500 million reads aligned by TopHat. Unless the complexity is horrible?

My command-lines are the following

tophat -g 1 -G genes.gtf --library-type=fr-firstsrand -o output BowtieIndex/genome input.fastq input.fastq

cufflinks -u -v -b genome.fa -g genes.gtf -o output input.bam

cuffmerge -g genes.gtf -s genome.fa cuffmerge.txt

cuffdiff -u -b genome.fa cuffmerge/merged.gtf -o output sample1.bam sample2.bam

I ran cufflinks and cuffdiff with and without "--library-type=fr-firststrand

Any input on what I might be doing wrong would be appreciated. G.

Have you looked at your FPKM distribution from the gtf files themselves? How about your alignment %? What sort of pre-processing did you do?

