Question

Can tophat/cufflinks output vary on different runs?

0

Entering edit mode

11.1 years ago

Bharat Iyengar ▴ 330

I had run exactly the same tophat command on the exacly same files in two different systems just to compare their performance.

tophat -p xx --library-type fr-firststrand -o CT16_frfs -r 150 --mate-std-dev 75 --no-mixed --transcriptome-index ./bwtindex/Transcriptome2 ./bwtindex/Dre_nuclear_2 R1_P.fastq R2_P.fastq

What I see is that the output file sizes are different!!

Details of the run

Run1

System: Workstation Intel(R) Xeon(R) CPU E5504 @ 2.00GHz
RAM: 24GB
OS: Fedora-20 64bit kernel 3.14.4-200
Used 7 cores.

Output file sizes:

5654399668 accepted_hits.bam
565 align_summary.txt
6790723 deletions.bed
5987915 insertions.bed
19286867 junctions.bed
186  prep_reads.info
1923179692 unmapped.bam

Run2

System: HPC with a single unit Intel(R) Xeon(R) CPU E7- 8837 @ 2.67GHz
RAM: 1000GB
OS: EL6 64bit kernel 2.6.32-279.5.1
Used 30 cores

Output file sizes:

6617365952 accepted_hits.bam
567 align_summary.txt
6804941 deletions.bed
6000226 insertions.bed
19329598 junctions.bed
186 prep_reads.info
2446203410 unmapped.bam

Finally when I do cuffdiff, post cufflinks and cuffmerge, the list of significantly differentially expressed genes is different (not grossly though. Run2 has more genes (~236) than run1 (~207). I still haven't checked the differences that might arise during cufflinks run (I would have to run cufflinks on run1 files on HPC to see that).

I would not want to stop analysis for this and would proceed with the larger file but I wish to know the reason for this discrepancy?

tophat cufflinks • 2.7k views

ADD COMMENT • link updated 3.7 years ago by Ram 45k • written 11.1 years ago by Bharat Iyengar ▴ 330

Ram · Answer 1 · 2014-06-09

0

Entering edit mode

11.1 years ago

Sam ★ 4.8k

My experience with this sort of problem is usually comes with the threading. Consider re-running your analysis using only 1 thread first. If there is still problem with that, check your input file maybe?

If you are working in the same directory (like, a cluster setting where all the input & output are store in the same location), then you might want to duplicates the input and work in separate folders just in case.

ADD COMMENT • link updated 5.5 years ago by Ram 45k • written 11.1 years ago by Sam ★ 4.8k

0

Entering edit mode

Oh if I use one core it will take eternity to finish.. The fastq file is huge

ADD REPLY • link 11.1 years ago by Bharat Iyengar ▴ 330

0

Entering edit mode

You can use a snapshot to try. Considering that you have already done the alignment, try to identify a gene where the two file differ in read counts. Then extract all the reads aligned to those genes in both file and perform the whole analysis with those reads.

Or much simply, if you are using human, then only supply the chrY reference (or part of the reference which is small enough for it to be quick with one thread)

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 11.1 years ago by Sam ★ 4.8k