Question: Mapping to the transcriptome using Bowtie - am I doing it right?
0
gravatar for A. Domingues
2.5 years ago by
A. Domingues1.6k
Mainz, Germany
A. Domingues1.6k wrote:

I am trying to reproduce some results from a paper, in which smallRNA sequencing reads are mapped to the transcriptome using bowtie. Since there is no code, these are the steps I did:

  1. Created a Transcriptome reference index with tophat -G Danio_rerio.gtf --transcriptome-index=TophatBowtieIndexTranscriptome/chr --bowtie1 BowtieIndexChr/chr;
  2. Used the created index to map using bowtie.

Looking at the mapped results something seemingly strange is going on - each gene in the GTF appears to become a "contig" for the mapping:

samtools view -H input.bam | head

@HD VN:1.0 SO:coordinate @SQ SN:0 LN:282 @SQ SN:1 LN:2870 @SQ SN:2 LN:2514 @SQ SN:3 LN:1809 @SQ SN:4 LN:1702 @SQ SN:5 LN:1590 @SQ SN:6 LN:1690 @SQ SN:7 LN:907 @SQ SN:8 LN:708

And the alignments look like this:

samtools view input.bam | head

HWI-ST558:373:C8897ACXX:2:1304:8723:58897 16 22 909 255 43M * 0 0 CCACTGAGAATGAGGTTCCAGTAGGCCACCGCCATCTCCAGAT JIHJJJJJJJJJJJJJIJJJJJJJJJJJJJJJJIHHHHHFFFF XA:i:0 MD:Z:43 NM:i:0 HWI-ST558:373:C8897ACXX:2:1307:8956:47009 0 35 1956 255 23M * 0 0 AGACTGTTAGTTTCGAGAAAGAT FFFDHHDHHIFHGIIIJIIJIJJ XA:i:0 MD:Z:23 NM:i:0 HWI-ST558:373:C8897ACXX:3:1212:14852:61943 0 35 3267 255 22M * 0 0 AGATGATGATTGAGACTCAGGC FFFFHHHHHJJJJJJJJJJJJG XA:i:0 MD:Z:22 NM:i:0

Is this is the expected result, or was the index not created appropriately? As someone else done anything similar?

ADD COMMENTlink modified 2.5 years ago by mastal5112.0k • written 2.5 years ago by A. Domingues1.6k
3
gravatar for michael.ante
2.5 years ago by
michael.ante2.6k
Austria/Vienna
michael.ante2.6k wrote:

If you align against the transcriptome, this is the expected outcome. The numbering from 0 to #transcripts comes from the fact, how the fasta file for the transcript-sequences was generated. The fasta header looks somewhat like

>0 ENST00000456328 1+ 11869-12227,12613-12721,13221-14409

Bowtie uses in such a case only the first part of the fasta identifier and discards the rest. So, from the numbering in your alignment, you can map the transcript identifier from the fasta file.

Cheers,

Michael

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by michael.ante2.6k
1

Cheers. Later, I dug up a little more in the generated index and also reached that conclusion.

ADD REPLYlink written 2.5 years ago by A. Domingues1.6k
2
gravatar for mastal511
2.5 years ago by
mastal5112.0k
mastal5112.0k wrote:

If you are mapping to the transcriptome, rather than the genome, then I think that is what you would expect.

ADD COMMENTlink written 2.5 years ago by mastal5112.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1678 users visited in the last hour