Can tophat/cufflinks output vary on different runs?
1
0
Entering edit mode
7.5 years ago

I had run exactly the same tophat command on the exacly same files in two different systems just to compare their performance.

tophat -p xx --library-type fr-firststrand -o CT16_frfs -r 150 --mate-std-dev 75 --no-mixed --transcriptome-index ./bwtindex/Transcriptome2 ./bwtindex/Dre_nuclear_2 R1_P.fastq R2_P.fastq


What I see is that the output file sizes are different!!

Details of the run

Run1

• System: Workstation Intel(R) Xeon(R) CPU E5504 @ 2.00GHz
• RAM: 24GB
• OS: Fedora-20 64bit kernel 3.14.4-200
• Used 7 cores.

Output file sizes:

5654399668 accepted_hits.bam
565 align_summary.txt
6790723 deletions.bed
5987915 insertions.bed
19286867 junctions.bed
186  prep_reads.info
1923179692 unmapped.bam


Run2

• System: HPC with a single unit Intel(R) Xeon(R) CPU E7- 8837 @ 2.67GHz
• RAM: 1000GB
• OS: EL6 64bit kernel 2.6.32-279.5.1
• Used 30 cores

Output file sizes:

6617365952 accepted_hits.bam
567 align_summary.txt
6804941 deletions.bed
6000226 insertions.bed
19329598 junctions.bed
186 prep_reads.info
2446203410 unmapped.bam


Finally when I do cuffdiff, post cufflinks and cuffmerge, the list of significantly differentially expressed genes is different (not grossly though. Run2 has more genes (~236) than run1 (~207). I still haven't checked the differences that might arise during cufflinks run (I would have to run cufflinks on run1 files on HPC to see that).

I would not want to stop analysis for this and would proceed with the larger file but I wish to know the reason for this discrepancy?

tophat cufflinks • 1.7k views
ADD COMMENT
0
Entering edit mode
7.5 years ago
Sam ★ 4.0k

My experience with this sort of problem is usually comes with the threading. Consider re-running your analysis using only 1 thread first. If there is still problem with that, check your input file maybe?

If you are working in the same directory (like, a cluster setting where all the input & output are store in the same location), then you might want to duplicates the input and work in separate folders just in case.

ADD COMMENT
0
Entering edit mode

Oh if I use one core it will take eternity to finish.. The fastq file is huge

ADD REPLY
0
Entering edit mode

You can use a snapshot to try. Considering that you have already done the alignment, try to identify a gene where the two file differ in read counts. Then extract all the reads aligned to those genes in both file and perform the whole analysis with those reads.

Or much simply, if you are using human, then only supply the chrY reference (or part of the reference which is small enough for it to be quick with one thread)

ADD REPLY

Login before adding your answer.

Traffic: 1691 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6