Split Read in Samtools
1
2
Entering edit mode
10.4 years ago

Say I want to parse out all split reads. These are reads with a clipped position and a secondary mapping.

How do I identify these reads in samtools? Is there a flag to find reads that are split with a secondary mapping.

Once I have these reads, how I can I tell which chromosome they are mapped to?

Here's an example of what I'm looking for:

The blue read should have a clipped signal, I want to parse out these reads that have the alternate mapping position on the same chromosome.

Thanks!

samtools wgs ngs split read clipped read • 14k views
ADD COMMENT
0
Entering edit mode

It largely depends on the mapper. Most mappers won't produce such alignments.

ADD REPLY
0
Entering edit mode

That's true. We have bam files that were mapped using BWA-MEM to allow for split reads. We ran lumpy which requires these reads.

ADD REPLY
0
Entering edit mode

In that case: samtools view aln.bam|grep SA:Z:

ADD REPLY
0
Entering edit mode

That did not work

ADD REPLY
0
Entering edit mode

Then you are using a wrong version of bwa. Update to the latest.

ADD REPLY
5
Entering edit mode
10.4 years ago
samtools view -f 256 Input.bam | awk '$6 ~/S/ && $7 == "=" {print $0}' > Secondary_clipped.sam

Please debug and use it.

PS: post modified.

ADD COMMENT
0
Entering edit mode

Close... I use the awk part of the command but the flag filter in samtools is not right.

ADD REPLY
1
Entering edit mode

My bad. It should be 256. I have changed it now.

ADD REPLY
0
Entering edit mode

Thanks this is much closer to what I expect.

I can work with this. I do have two more follow up questions.

  1. How do you know what flags to filter on... I can't get my head wrapped around the bitwise system. Any resources or advice is welcomed. Maybe its a biologist vs. computer scientist problem :p
  2. Is column 7 in your awk command is the reference to where the mate maps to? Or the split read.

Thanks a ton!

ADD REPLY
1
Entering edit mode

Ok so there are many things going on. Actually I re-read the question and comments that were later posted by you and Dr. Li and realized that you used BWA-mem. First, secondary alignments (0x100) are different from supplementary alignments (0x800). Supplementary alignments can be considered as secondary alignments with split read. I think you are interested in supplementary alignments produced by BWA mem. Did you use -M option when you ran BWA-mem. If yes, then 0x800 flag was replaced by 0x100 in the output sam file. I am not sure whats your motive behind extracting these reads for DELLY. Do you want to save running time for DELLY? DELLY can extract these reads on its own. This is what I read on DELLY's github page [Delly expects two alignment records in the bam file for every paired-end, one for the first and one for the second read. Multiple split-read alignment records of a given read are allowed if and only if one of them (e.g. the longest split alignment) is a primary alignment whereas all others are marked as secondary or supplementary (flag 0x0100 or flag 0x0800).] I am little bit confused with what you exactly want. Also, please elaborate if you used a paired end data or is it a single end read.

To answer your questions above:

  1. http://broadinstitute.github.io/picard/explain-flags.html but also see this post: Can You Make A Bam File Picard Compatible? I Forgot -M With Bwa Mem
  2. Column 7 refers to where mate mapped. You can't get information about which chromosome the clipped read would map because that portion of read was not used in mapping at all. This is not a RNA-seq alignment which will use "N" CIGAR to show a gap and mapping position of the clipped read. All you know is that read was clipped and the remaining read was mapped to a particular chromosome.
ADD REPLY
0
Entering edit mode

Hi, in the awk filtering, shouldn't $6~/N/ (N - skipped region from the reference) be used instead of $6~/S/ (S - soft clipping )?

ADD REPLY

Login before adding your answer.

Traffic: 3298 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6