Low percentage of reads in Tophat, Is the any setting to improve?
0
1
Entering edit mode
7.7 years ago

Hi, Please, I would like an suggestion.

I have mapped 100 bp paired end data from Illumina machine. I used Tophat for mapping, but I have obtained low mapped reads which was 5%. Is the any parameter in tophat to get higher percentage of mapped reads? Could there be any other problem too?

Tophat • 3.6k views
2
Entering edit mode

What settings did you use? Did you adapter/quality trim? What species were the reads from and what species did you map against?

1
Entering edit mode

how could it be?

can you share your tophat command here?

1
Entering edit mode

Have you tried aligning with another program such as STAR? It's possible your data is contaminated or in some other way faulty.

0
Entering edit mode

Hi all, thanks by help.

Initially I used the commands:

tophat2 -p 3 -o noadapter__thout --library-type=fr-unstranded genome A2_1_noadapter.fastq A2_2_noadapter.fastq


Later, I removed the adapters and apply quality filters, using the the software tools trimmomatic or Fastx.

I obtained only 11,5 percent of mapped reads before and later adapter trimmer and quality filter when applied:

Commands:

The following command keep reads which has quality score above 20 in at least 50% of bases.

fastq_quality_filter -Q33 -q20 -p 50 -i <SAMPLE_NAME>.fastq -o <SAMPLE_NAME>.quality_filter.fastq


The following operation removes nucleotides having quality scores lower than 20 from the ends of reads. Furthermore, any trimmed reads having lengths less than 50 nucleotides are discarded altogether:

fastq_quality_trimmer -Q33 -t 20 -l 50 -i <SAMPLE_NAME>.fastq -o <SAMPLE_NAME>.clean.fastq


To remove base sequence content and GC content from the end of reads, following command was used. It removes 15 nucleotides from the end of reads.

fastx_trimmer -Q33 -f 1 -l 335 -i <SAMPLE_NAME>.clean.fastq -o <SAMPLE_NAME>.fastx_trimmer.fastq


After this step, the read length distribution changed minimally, with the majority of reads retaining their full length. In addition around 25% of the reads were discarded completely.

In order to remove identical sequences, fastx_collapser tool was used:

fastx_collapser -v -i <SAMPLE_NAME>.fastq -o <SAMPLE_NAME>_collapsed.fasta


Above tools removes few millions reads from each files while maintaining all read counts and gives output in fasta format.

La

2
Entering edit mode
1. Do not use the fastx toolkit with paired end datasets. Trimmomatic is fine, but any tool that processes the two fastq files for each sample separately should never be used.
2. There is absolutely no reason to use fastx_collapser unless you plan on using the resulting collapsed reads for assembly. Aligners will be perfectly happy with multiple essentially identical read pairs and the downstream statistics are more annoying to do if you collapse the reads.
0
Entering edit mode

Thanks Devon! :D