Question: How to find the insertion sequence in .bam file?
gravatar for kimpole1017
4.8 years ago by
kimpole10170 wrote:

I come up to aligning the fastq file based on the reference and as a result, I got the aligned bam file. For the next step, I am going to find an insertion sequence in the specific region of the bam file. Firstly, I checked the site with PCR, and there is an insertion sequence. By the way, when I checked the bam file with bamview, there are no insertion sequences in the region. What is wrong with the steps? I think that the default setting of bwa threw the insertion sequence away...

steps: 1. bwa index ref.fa; 2. bwa aln ref.fa read1.fq > r1.sai; 3. bwa aln ref.fa read2.fq > r2.sai; 4. bwa sampe ref.fa r1.sai r2.sai read1.fa read2.fq | samtools view -bSho out.bam 5. bamview and checked the bam compare to ref.fa

bwa bioinfomatics bam is • 2.7k views
ADD COMMENTlink modified 4.8 years ago by Jorge Amigo12k • written 4.8 years ago by kimpole10170

Are you talking about a small (few bp) or large insertion (>50bp)? What is inserted?

ADD REPLYlink written 4.8 years ago by WouterDeCoster45k

The sizes will be larger than 1000bp, it is insertion sequences or so to say transposons. By the way, I found a tool, called "pindel" and dealing with it. Thank you for your interest and I am always opened to your helpful information.

ADD REPLYlink written 4.8 years ago by kimpole10170
gravatar for Jorge Amigo
4.8 years ago by
Jorge Amigo12k
Santiago de Compostela, Spain
Jorge Amigo12k wrote:

if you read about NGS you will notice that there's always an adjective next to reads and indel, which is short. the reason is that NGS reads are short (a few to several tens depending on the technology chosen), and therefore the indels you can detect with them should also be short.

if you want to find an insertion (which is a sequence not contained in the reference genome) by sequencing and mapping you need to ensure that your sequencing reads cover both the insertion and an anchor point in the reference genome, or else the reads won't be able to map at all. the reads covering the insertion site only will contain a sequence that's not in the reference, therefore they won't map. even if the anchor sequence is substantial, if the insertion is too long then the mapping process can still have trouble dealing with too many mismatches, therefore the reads could still not map. so depending on the size of the insertion and the size of your sequencing reads it may be definitely possible that you may not find your insertion at all in your aligned bam file: you may have sequenced it (unaligned reads), but it may be difficult to map.

ADD COMMENTlink written 4.8 years ago by Jorge Amigo12k

As I mentioned above, it will be over 1000 bps. Thank you for your concern:)

ADD REPLYlink written 4.8 years ago by kimpole10170
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1066 users visited in the last hour