Question: Split Read in Samtools
1
gravatar for QVINTVS_FABIVS_MAXIMVS
5.3 years ago by
USA SoCal
QVINTVS_FABIVS_MAXIMVS2.4k wrote:

Say I want to parse out all split reads. These are reads with a clipped position and a secondary mapping.

How do I identify these reads in samtools? Is there a flag to find reads that are split with a secondary mapping.

Once I have these reads, how I can I tell which chromosome they are mapped to?

Here's an example of what I'm looking for:

The blue read should have a clipped signal, I want to parse out these reads that have the alternate mapping position on the same chromosome.

Thanks!

ADD COMMENTlink modified 5.3 years ago • written 5.3 years ago by QVINTVS_FABIVS_MAXIMVS2.4k

It largely depends on the mapper. Most mappers won't produce such alignments.

ADD REPLYlink written 5.3 years ago by lh332k

That's true. We have bam files that were mapped using BWA-MEM to allow for split reads. We ran lumpy which requires these reads.

ADD REPLYlink written 5.3 years ago by QVINTVS_FABIVS_MAXIMVS2.4k

In that case: samtools view aln.bam|grep SA:Z:

ADD REPLYlink written 5.3 years ago by lh332k

That did not work

ADD REPLYlink written 5.3 years ago by QVINTVS_FABIVS_MAXIMVS2.4k

Then you are using a wrong version of bwa. Update to the latest.

ADD REPLYlink written 5.3 years ago by lh332k
5
gravatar for Ashutosh Pandey
5.3 years ago by
Philadelphia
Ashutosh Pandey12k wrote:
samtools view -f 256 Input.bam | awk '$6 ~/S/ && $7 == "=" {print $0}' > Secondary_clipped.sam

Please debug and use it.

 

PS: post modified. 

ADD COMMENTlink modified 5.3 years ago • written 5.3 years ago by Ashutosh Pandey12k

close... I use the awk part of the command but the flag filter in samtools is not right.

 

ADD REPLYlink written 5.3 years ago by QVINTVS_FABIVS_MAXIMVS2.4k
1

My bad. It should be 256. I have changed it now. 

ADD REPLYlink written 5.3 years ago by Ashutosh Pandey12k

Thanks this is much closer to what I expect.

 

I can work with this. I do have two more follow up questions.

 

1. How do you know what flags to filter on... I can't get my head wrapped around the bitwise system. Any resources or advice is welcomed. Maybe its a biologist vs. computer scientist problem :p

2. Is column 7 in your awk command is the reference to where the mate maps to? Or the split read.

 

Thanks a ton!

ADD REPLYlink written 5.3 years ago by QVINTVS_FABIVS_MAXIMVS2.4k

Ok so there are many things going on. Actually I re-read the question and comments that were later posted by you and Dr. Li and realized that you used BWA-mem. First, secondary alignments (0x100) are different from supplementary alignments (0x800). Supplementary alignments can be considered as secondary alignments with split read. I think you are interested in supplementary alignments produced by BWA mem. Did you use "-M" option when you ran BWA-mem. If yes, then 0x800 flag was replaced by 0x100 in the output sam file. I am not sure whats your motive behind extracting these reads for DELLY. Do you want to save running time for DELLY? DELLY can extract these reads on its own. This is what I read on DELLY's github page [Delly expects two alignment records in the bam file for every paired-end, one for the first and one for the second read. Multiple split-read alignment records of a given read are allowed if and only if one of them (e.g. the longest split alignment) is a primary alignment whereas all others are marked as secondary or supplementary (flag 0x0100 or flag 0x0800).] I am little bit confused with what you exactly want. Also, please elaborate if you used a paired end data or is it a single end read. 

To answer your questions above:

1) http://broadinstitute.github.io/picard/explain-flags.html but also see this post: Can You Make A Bam File Picard Compatible? I Forgot -M With Bwa Mem

2) column 7 refers to where mate mapped. You can't get information about which chromosome the clipped read would map because that portion of read was not used in mapping at all. This is not a RNA-seq alignment which will use "N" CIGAR to show a gap and mapping position of the clipped read.  All you know is that read was clipped and the remaining read was mapped to a particular chromosome. 

ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by Ashutosh Pandey12k

Hi, in the awk filtering, shouldn't $6~/N/ (N - skipped region from the reference) be used instead of $6~/S/ (S - soft clipping )?

ADD REPLYlink written 2.8 years ago by jea.anna20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1682 users visited in the last hour