Question

Mapping to the transcriptome using Bowtie - am I doing it right?

0

Entering edit mode

8.1 years ago

A. Domingues ★ 2.7k

I am trying to reproduce some results from a paper, in which smallRNA sequencing reads are mapped to the transcriptome using bowtie. Since there is no code, these are the steps I did:

Created a Transcriptome reference index with tophat -G Danio_rerio.gtf --transcriptome-index=TophatBowtieIndexTranscriptome/chr --bowtie1 BowtieIndexChr/chr;
Used the created index to map using bowtie.

Looking at the mapped results something seemingly strange is going on - each gene in the GTF appears to become a "contig" for the mapping:

samtools view -H input.bam | head

@HD VN:1.0 SO:coordinate @SQ SN:0 LN:282 @SQ SN:1 LN:2870 @SQ SN:2 LN:2514 @SQ SN:3 LN:1809 @SQ SN:4 LN:1702 @SQ SN:5 LN:1590 @SQ SN:6 LN:1690 @SQ SN:7 LN:907 @SQ SN:8 LN:708

And the alignments look like this:

samtools view input.bam | head

HWI-ST558:373:C8897ACXX:2:1304:8723:58897 16 22 909 255 43M * 0 0 CCACTGAGAATGAGGTTCCAGTAGGCCACCGCCATCTCCAGAT JIHJJJJJJJJJJJJJIJJJJJJJJJJJJJJJJIHHHHHFFFF XA:i:0 MD:Z:43 NM:i:0 HWI-ST558:373:C8897ACXX:2:1307:8956:47009 0 35 1956 255 23M * 0 0 AGACTGTTAGTTTCGAGAAAGAT FFFDHHDHHIFHGIIIJIIJIJJ XA:i:0 MD:Z:23 NM:i:0 HWI-ST558:373:C8897ACXX:3:1212:14852:61943 0 35 3267 255 22M * 0 0 AGATGATGATTGAGACTCAGGC FFFFHHHHHJJJJJJJJJJJJG XA:i:0 MD:Z:22 NM:i:0

Is this is the expected result, or was the index not created appropriately? As someone else done anything similar?

bowtie gtf transcriptome zebrafish RNA-Seq • 5.5k views

ADD COMMENT • link updated 8.1 years ago by mastal511 ★ 2.1k • written 8.1 years ago by A. Domingues ★ 2.7k

score 3 · Accepted Answer · 2016-03-10

3

Entering edit mode

8.1 years ago

michael.ante ★ 3.8k

If you align against the transcriptome, this is the expected outcome. The numbering from 0 to #transcripts comes from the fact, how the fasta file for the transcript-sequences was generated. The fasta header looks somewhat like

>0 ENST00000456328 1+ 11869-12227,12613-12721,13221-14409

Bowtie uses in such a case only the first part of the fasta identifier and discards the rest. So, from the numbering in your alignment, you can map the transcript identifier from the fasta file.

Cheers,

Michael

ADD COMMENT • link 8.1 years ago by michael.ante ★ 3.8k

1

Entering edit mode

Cheers. Later, I dug up a little more in the generated index and also reached that conclusion.

ADD REPLY • link 8.1 years ago by A. Domingues ★ 2.7k

score 2 · Accepted Answer · 2016-03-10

2

Entering edit mode

8.1 years ago

mastal511 ★ 2.1k

If you are mapping to the transcriptome, rather than the genome, then I think that is what you would expect.

ADD COMMENT • link 8.1 years ago by mastal511 ★ 2.1k