Question: How to view reads/parts of reads that lay outside of reference from bam format
gravatar for tobias221188
5.1 years ago by
tobias22118810 wrote:

Hey together,

I spent the last days trying to figure out how I can get the sequence information from all the reads in my .bam file that are flanking my reference sequence. I have many, but very short reference sequences (120bp each, equivalent of 1 RNA bait from the sequence capture process). I mapped my reads to those reference sequences (clc_mapper) and am now trying to work with the bam file which I can view in e.g. Tablet and which contains many reads for each locus. Problem is, there is no way to see what is beyond the max. 120bp of each read that were mapped against the reference sequence, in Tablet you only see a window of 120bp length where all reads are cut to math that window. The read-length from the illumina sequencing is 300 bp, so the reads are much longer than 120bp and should overlap considerably o both sides of the short reference. Those flanking regions are actually the most interesting ones in my case. I'm wondering if that information about the flanking regions is in the bam file at all or if the information in the bam format is limited in the sense that it cuts each read at the end of the reference sequence.

Does anybody have an idea about

1. How to visualize the flanking regions and

2. How to create a consensus sequence which extends as far as possible across the reference sequence.

Thank you very much, your help is really appreciated!



next-gen assembly • 2.0k views
ADD COMMENTlink written 5.1 years ago by tobias22118810

There's a bit of ambiguity in your question. Were the 300bp reads clipped by CLC during the mapping process? Or alternatively, were the whole reads mapped but CLC clipped the resulting alignments to just show what's overlapping some sort of bait region?

In either case, have you used samtools to just see if the BAM file even contains the full-length reads?

ADD REPLYlink written 5.1 years ago by Devon Ryan94k

that is part of my problem, I can't figure out a good way to see/display what's in my bam file. Which samtools command do you use? view or tview? I'm not that familiar with samtools yet and I find it not very easy to figure this out. Thanks for your help!

ADD REPLYlink written 5.1 years ago by tobias22118810

update: I just checked the "view" command in the command line for samtools and by manually checking some of the sequences/reads it is spitting out, it turns out that they are of varying length, but the majority between 200 and 300bp.

If I use the tview command it seems that the reads are cut so that they are 120bp long and matching the reference sequence.

I really don't understand this format and what is in my bam file and what not...

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by tobias22118810
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1858 users visited in the last hour