Question: Tophat runs through without errors but no read mapped
0
gravatar for jrxu.bioinf
20 months ago by
jrxu.bioinf20
United States
jrxu.bioinf20 wrote:

Hello,

I am a new user of tophat. The version in use is v2.1.1 (tophat2).

I used default parameters except for --no-coverage-search to save time. The read length is 102. The running output looks fine (as shown below), but 0% reads are mapped to the genome.

BTW, this exact same setting has been successfully mapped another set of RNA-seq data (read length = 30).

Thanks!

[2016-06-26 14:44:07] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2016-06-26 14:44:07] Checking for Bowtie
                  Bowtie version:        2.2.9.0
[2016-06-26 14:44:07] Checking for Bowtie index files (genome)..
[2016-06-26 14:44:07] Checking for reference FASTA file
[2016-06-26 14:44:07] Generating SAM header for ~/data/Mus_musculus/UCSC/mm9/Sequence/Bowtie2Index/genome
[2016-06-26 14:44:58] Reading known junctions from GTF file
[2016-06-26 14:45:01] Preparing reads
         left reads: min. length=102, max. length=102, 36967894 kept reads (26707 discarded)
[2016-06-26 14:58:17] Building transcriptome data files ./tophat_out/tmp/genes
[2016-06-26 14:58:55] Building Bowtie index from genes.fa
[2016-06-26 15:05:33] Mapping left_kept_reads to transcriptome genes with Bowtie2
[2016-06-26 15:28:27] Resuming TopHat pipeline with unmapped reads
[2016-06-26 15:28:27] Mapping left_kept_reads.m2g_um to genome genome with Bowtie2
[2016-06-26 16:14:47] Mapping left_kept_reads.m2g_um_seg1 to genome genome with Bowtie2 (1/4)
[2016-06-26 16:34:49] Mapping left_kept_reads.m2g_um_seg2 to genome genome with Bowtie2 (2/4)
[2016-06-26 16:50:05] Mapping left_kept_reads.m2g_um_seg3 to genome genome with Bowtie2 (3/4)
[2016-06-26 17:00:49] Mapping left_kept_reads.m2g_um_seg4 to genome genome with Bowtie2 (4/4)
[2016-06-26 17:18:38] Searching for junctions via segment mapping
[2016-06-26 17:28:01] Retrieving sequences for splices
[2016-06-26 17:29:11] Indexing splices
Building a SMALL index
[2016-06-26 17:30:18] Mapping left_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/4)
[2016-06-26 17:37:09] Mapping left_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/4)
[2016-06-26 17:44:09] Mapping left_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/4)
[2016-06-26 17:50:15] Mapping left_kept_reads.m2g_um_seg4 to genome segment_juncs with Bowtie2 (4/4)
[2016-06-26 17:58:30] Joining segment hits
[2016-06-26 18:05:44] Reporting output tracks
-----------------------------------------------
[2016-06-26 18:18:49] A summary of the alignment counts can be found in ./tophat_out/align_summary.txt
[2016-06-26 18:18:49] Run complete: 03:34:42 elapsed

ALIGN summary is below

Reads:
          Input     :  36994601
           Mapped   :      5067 ( 0.0% of input)
            of these:      1534 (30.3%) have multiple alignments (3 have >20)
 0.0% overall read mapping rate.
ADD COMMENTlink modified 20 months ago by GenoMax42k • written 20 months ago by jrxu.bioinf20

Any time you see no or less than expected alignment the first thing to try is to take a random sample of reads (10-15) and to do a blast at NCBI. If the top hits are not from the genome you expect to be there then you will have to start figuring out what went wrong. If the blast hits are partial then it is possible that you have adapter contamination in your data (did you look at the data with FastQC before alignments) and you would need to trim the reads before alignment.

ADD REPLYlink written 20 months ago by GenoMax42k
1

I should have used "split-file" to convert SRA to FASTQ. Without the parameter, the pair-end reads are merged into one and cause the problem!

ADD REPLYlink written 20 months ago by jrxu.bioinf20

Use Kraken to screen reads, it is faster than BLAST and allows you to screen the whole dataset.

ADD REPLYlink written 20 months ago by pld4.6k

Default kraken db only has bacterial, archaeal and viral data so that would not always provide a useful answer. Surely blasting 10-15 sequences (in this case where almost no reads are aligning) would be much faster than kraken.

ADD REPLYlink written 20 months ago by GenoMax42k

You can add sequences to or alter Kraken databases. Sure, blasting a few reads is quick, but won't allow you to get an idea of the degree of contamination.

ADD REPLYlink written 20 months ago by pld4.6k

I tested several reads. Each read is mapped perfectly to a transcript correctly, BUT the first half of the read is mapped to forward strand and the second half to reverse strand. How to handle this read format? Thanks.

ADD REPLYlink written 20 months ago by jrxu.bioinf20

Did you run FastQC on these?

ADD REPLYlink written 20 months ago by pld4.6k

That most likely indicates that you have short inserts (and thus read-through/contamination with Illumina adapters). You would need to trim these reads to get them to aligns.

ADD REPLYlink modified 20 months ago • written 20 months ago by GenoMax42k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 721 users visited in the last hour