Question: Extract reads where both pairs maps properly and none of the pairs are split from a bam file
0
gravatar for rimjhim.roy.ch
2.3 years ago by
European Union
rimjhim.roy.ch70 wrote:

Hi,

I want to extract only those reads from bam file in a given region where both pairs are mapped properly and are not split.

Currently I am doing:

samtools view -h -f 2 -F 4 input.bam "chr:start-stop" | awk '$6 !~ /S/ || $1 ~ /@/' | samtools view -bS - > output.proper.nosplit.bam

Where awk '$6 !~ /S/ || $1 ~ /@/' removes the split reads. However, I also want the properly mapped pair of the split read to be removed.

I tried to run samtools view -h -f 2 -F 4 output.proper.nosplit.bam > output2.sam, but it is still not removing the pair of the split read.

Any idea?

Thanks in advance

ADD COMMENTlink modified 2.3 years ago by Pierre Lindenbaum124k • written 2.3 years ago by rimjhim.roy.ch70
3
gravatar for Pierre Lindenbaum
2.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

create a list of read-names that are not properly mapped or unmapped or contain a soft clip or a hard clip.

samtools view  -F 2  in.bam | cut -f 1 > out.1
samtools view  -f4  in.bam | cut -f 1 >> out.1
samtools view -F4 in.bam |  awk '$6 ~ /S/ || $6 ~ /H/' | cut -f1 >> out.1

sort / uniq the list , sort your bam on query-name and use the list with https://broadinstitute.github.io/picard/command-line-overview.html#FilterSamReads with FILTER=excludeReadList

ADD COMMENTlink modified 2.3 years ago by genomax74k • written 2.3 years ago by Pierre Lindenbaum124k

can we use 2048 flag (and/or 256) to filter out split reads from the region of interest?

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by cpad011212k

Depends on which flag your aligner writes the split reads to. To remove just the secondary/supplementary alignments I would use the relevant SAM flag, to remove all records that are split reads I would check for the presence of the SA tag.

ADD REPLYlink written 2.3 years ago by d-cameron2.1k

thanks @d-cameron

ADD REPLYlink written 2.3 years ago by cpad011212k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1267 users visited in the last hour