I spent the last days trying to figure out how I can get the sequence information from all the reads in my .bam file that are flanking my reference sequence. I have many, but very short reference sequences (120bp each, equivalent of 1 RNA bait from the sequence capture process). I mapped my reads to those reference sequences (clc_mapper) and am now trying to work with the bam file which I can view in e.g. Tablet and which contains many reads for each locus. Problem is, there is no way to see what is beyond the max. 120bp of each read that were mapped against the reference sequence, in Tablet you only see a window of 120bp length where all reads are cut to math that window. The read-length from the illumina sequencing is 300 bp, so the reads are much longer than 120bp and should overlap considerably o both sides of the short reference. Those flanking regions are actually the most interesting ones in my case. I'm wondering if that information about the flanking regions is in the bam file at all or if the information in the bam format is limited in the sense that it cuts each read at the end of the reference sequence.
Does anybody have an idea about
1. How to visualize the flanking regions and
2. How to create a consensus sequence which extends as far as possible across the reference sequence.
Thank you very much, your help is really appreciated!