Question: Different version of Tophat gives me different number of mapped reads
0
gravatar for BioDH
13 months ago by
BioDH0
BioDH0 wrote:

I have paired-end sequenced RNA-seq files(Illumina, fastq).

I trimmed the reads by trimmomatic

java -jar <path>/trimmomatic-0.36.jar PE -threads 4 -phred33 1.fastq 2.fastq 1_trim1.fastq 1_unpaire    d1.fastq 2_trim2.fastq 2_unpaired2.fastq ILLUMINACLIP:<path>/TruSeq3-PE-2.fa:3:30:10 SLIDINGWINDOW:5:20 MINLEN:20

around 14M paired reads were survived.

I aligned the trimmed fastq files to genome by tophat(old version = 2.1.0, new version = 2.1.1) on exactly same argments.

# This is old version (2.1.0)  
 tophat --num-threads 4 --read-mismatches 1 --read-edit-dist 2 --read-realign-edit-dist 1000 -a 8 -m 0 -i 30 -I 1000 -g 1 --min-segment-intron 30 --max-segment-intron 1000 --segment-mismatches 1 --segment-length 25 --library-type fr-secondstrand --max-insertion-length 3 --max-deletion-length 3 --no-coverage-search -r 100 --mate-std-dev 20 -o ./local_tophat_old_alignments <genome> 1_trim1.fastq 2_trim2.fastq

 # This is new version (2.1.1)

~/script/tophat-2.1.1/tophat --num-threads 4 --read-mismatches 1 --read-edit-dist 2 --read-realign-edit-dist 1000 -a 8 -m 0 -i 30 -I 1000 -g 1 --min-segment-intron 30 --max-segment-intron 1000 --segment-mismatches 1 --segment-length 25 --library-type fr-secondstrand --max-insertion-length 3 --max-deletion-length 3 --no-coverage-search -r 100 --mate-std-dev 20 -o ./local_tophat_new_alignments <genome> 1_trim1.fastq 2_trim2.fastq

I check the number of reads

samtools veiw -c accepted_hits.bam

old version gave me 6,910,198

new version 27,645,322

I don't know why the number of reads are so different.

Just in case, I show you align_summary.txt

#old version
Left reads:
          Input     :   3540517
           Mapped   :   3449572 (97.4% of input)
Right reads:
          Input     :   3540517
           Mapped   :   3460626 (97.7% of input)
97.6% overall read mapping rate.

Aligned pairs:   3380776
                 1025738 (30.3%) are discordant alignments
66.5% concordant pair alignment rate.

#new version
Left reads:
          Input     :  14189364
           Mapped   :  13781744 (97.1% of input)
Right reads:
          Input     :  14189364
           Mapped   :  13863578 (97.7% of input)
97.4% overall read mapping rate.

Aligned pairs:  13503616
                 4100383 (30.4%) are discordant alignments
66.3% concordant pair alignment rate.

Could anyone please explain it?

rna-seq alignment tophat • 474 views
ADD COMMENTlink modified 13 months ago by swbarnes26.5k • written 13 months ago by BioDH0
1

All versions of TopHat are the old version. It should not be used any more - the authors themselves state this.

ADD REPLYlink written 13 months ago by Joe14k
1
gravatar for swbarnes2
13 months ago by
swbarnes26.5k
United States
swbarnes26.5k wrote:

Obviously the "old" run omitted a lot of of your reads. Figure out why. I'd start by just rerunning it again, in case it got randomly stopped mid run.

ADD COMMENTlink written 13 months ago by swbarnes26.5k
2

swbarnes2's answer is the correct answer to your question: somehow the run with Tophat 2.1.0 didn't read the entire fastq files.

However, jrj.healey comment above is the correct answer to your needs. From the Tophat2 page:

Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 which provides the same core functionality (i.e. spliced alignment of RNA-Seq reads), in a more accurate and much more efficient way.

ADD REPLYlink written 13 months ago by h.mon27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 951 users visited in the last hour