Question: Low mapping rate for Tophat2
0
gravatar for fadhil.abubaker
23 months ago by
fadhil.abubaker20 wrote:

Hi,

I'm somewhat new to bioinformatics, so please bear with me. I'm running tophat2 on some fastq files using the HG38 as reference. This is the command that I ran: tophat2 --b2-sensitive -G /home/fadhil/hg38_ref/lib/hg38.refGene.gtf -p 16 -o /home/data/mcf10_tophat_output /home/fadhil/Bowtie2Index/genome ./SRR925720_mcf10a.fastq

It takes about 8 hours, but in the end the mapping rate is almost 0%, it maps 3997 out of 31898079 reads. I'm not sure I understand why this is happening, although tophat emitted the following error consecutively as it was running:

Warning: Encountered reference sequence with only gaps

Ignoring any potential errors with the fastq files themselves, what could possibly be the problem here?

rna-seq genome software error • 704 views
ADD COMMENTlink modified 23 months ago by Carlo Yague4.7k • written 23 months ago by fadhil.abubaker20
1

You should know that the old 'Tuxedo' pipeline of Tophat and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

ADD REPLYlink written 23 months ago by WouterDeCoster41k
2
gravatar for Carlo Yague
23 months ago by
Carlo Yague4.7k
Belgium
Carlo Yague4.7k wrote:
Warning: Encountered reference sequence with only gaps

Have you checked if your reference genome is correct ? Locate the fasta file in

ls /home/fadhil/Bowtie2Index/genome

Lets say it's called "genome.fa". Then check if it looks good :

head /home/fadhil/Bowtie2Index/genome.fa

If it is ok, you can try to rebuild the index

bowtie2-build /home/fadhil/Bowtie2Index/genome.fa /home/fadhil/Bowtie2Index/genome

hope this helps.

ADD COMMENTlink written 23 months ago by Carlo Yague4.7k

The files in /home/fadhil/Bowtie2Index/genome are all .bt2 files and not fasta. I'm not too sure if this makes any difference.

ADD REPLYlink written 23 months ago by fadhil.abubaker20

The bowtie index was build using fasta files, or where did you get the index?

ADD REPLYlink written 23 months ago by WouterDeCoster41k

I would suggest that you download the reference in fasta and rebuild the index with the above command. Your reference and indexes seem corrupted.

ADD REPLYlink written 23 months ago by Carlo Yague4.7k

In the end rebuilding indices helped a bit but I was still getting low mapping rate. Turns out the fastq files were corrupted and re-downloading them helped. Thanks for everyone's help!

ADD REPLYlink written 23 months ago by fadhil.abubaker20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1327 users visited in the last hour