How to find the insertion sequence in .bam file?
1
0
Entering edit mode
7.9 years ago

I come up to aligning the fastq file based on the reference and as a result, I got the aligned bam file. For the next step, I am going to find an insertion sequence in the specific region of the bam file. Firstly, I checked the site with PCR, and there is an insertion sequence. By the way, when I checked the bam file with bamview, there are no insertion sequences in the region. What is wrong with the steps? I think that the default setting of bwa threw the insertion sequence away.

Steps:

  1. bwa index ref.fa;
  2. bwa aln ref.fa read1.fq > r1.sai;
  3. bwa aln ref.fa read2.fq > r2.sai;
  4. bwa sampe ref.fa r1.sai r2.sai read1.fa read2.fq | samtools view -bSho out.bam
  5. bamview and checked the bam compare to ref.fa
bam bwa • 4.3k views
ADD COMMENT
1
Entering edit mode

Are you talking about a small (few bp) or large insertion (>50bp)? What is inserted?

ADD REPLY
0
Entering edit mode

The sizes will be larger than 1000bp, it is insertion sequences or so to say transposons. By the way, I found a tool, called "pindel" and dealing with it. Thank you for your interest and I am always opened to your helpful information.

ADD REPLY
1
Entering edit mode
7.9 years ago

if you read about NGS you will notice that there's always an adjective next to reads and indel, which is short. the reason is that NGS reads are short (a few to several tens depending on the technology chosen), and therefore the indels you can detect with them should also be short.

if you want to find an insertion (which is a sequence not contained in the reference genome) by sequencing and mapping you need to ensure that your sequencing reads cover both the insertion and an anchor point in the reference genome, or else the reads won't be able to map at all. the reads covering the insertion site only will contain a sequence that's not in the reference, therefore they won't map. even if the anchor sequence is substantial, if the insertion is too long then the mapping process can still have trouble dealing with too many mismatches, therefore the reads could still not map. so depending on the size of the insertion and the size of your sequencing reads it may be definitely possible that you may not find your insertion at all in your aligned bam file: you may have sequenced it (unaligned reads), but it may be difficult to map.

ADD COMMENT
0
Entering edit mode

As I mentioned above, it will be over 1000 bps. Thank you for your concern:)

ADD REPLY

Login before adding your answer.

Traffic: 2143 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6