Question: Low mapping rate for Tophat2
0
gravatar for fadhil.abubaker
2.7 years ago by
fadhil.abubaker20 wrote:

Hi,

I'm somewhat new to bioinformatics, so please bear with me. I'm running tophat2 on some fastq files using the HG38 as reference. This is the command that I ran: tophat2 --b2-sensitive -G /home/fadhil/hg38_ref/lib/hg38.refGene.gtf -p 16 -o /home/data/mcf10_tophat_output /home/fadhil/Bowtie2Index/genome ./SRR925720_mcf10a.fastq

It takes about 8 hours, but in the end the mapping rate is almost 0%, it maps 3997 out of 31898079 reads. I'm not sure I understand why this is happening, although tophat emitted the following error consecutively as it was running:

Warning: Encountered reference sequence with only gaps

Ignoring any potential errors with the fastq files themselves, what could possibly be the problem here?

rna-seq genome software error • 858 views
ADD COMMENTlink modified 2.7 years ago by Carlo Yague5.0k • written 2.7 years ago by fadhil.abubaker20
1

You should know that the old 'Tuxedo' pipeline of Tophat and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

ADD REPLYlink written 2.7 years ago by WouterDeCoster44k
2
gravatar for Carlo Yague
2.7 years ago by
Carlo Yague5.0k
Canada
Carlo Yague5.0k wrote:
Warning: Encountered reference sequence with only gaps

Have you checked if your reference genome is correct ? Locate the fasta file in

ls /home/fadhil/Bowtie2Index/genome

Lets say it's called "genome.fa". Then check if it looks good :

head /home/fadhil/Bowtie2Index/genome.fa

If it is ok, you can try to rebuild the index

bowtie2-build /home/fadhil/Bowtie2Index/genome.fa /home/fadhil/Bowtie2Index/genome

hope this helps.

ADD COMMENTlink written 2.7 years ago by Carlo Yague5.0k

The files in /home/fadhil/Bowtie2Index/genome are all .bt2 files and not fasta. I'm not too sure if this makes any difference.

ADD REPLYlink written 2.7 years ago by fadhil.abubaker20

The bowtie index was build using fasta files, or where did you get the index?

ADD REPLYlink written 2.7 years ago by WouterDeCoster44k

I would suggest that you download the reference in fasta and rebuild the index with the above command. Your reference and indexes seem corrupted.

ADD REPLYlink written 2.7 years ago by Carlo Yague5.0k

In the end rebuilding indices helped a bit but I was still getting low mapping rate. Turns out the fastq files were corrupted and re-downloading them helped. Thanks for everyone's help!

ADD REPLYlink written 2.6 years ago by fadhil.abubaker20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 766 users visited in the last hour