Question: Filter bam files using a bed file: Why is the mate missing?
0
gravatar for komal.rathi
21 months ago by
komal.rathi3.4k
Children's Hospital of Philadelphia, Philadelphia, PA
komal.rathi3.4k wrote:

Hi everyone,

This is a sorted bam (by coordinates):

samtools view 7316-161-T_Aligned.out.sorted.bam | grep 'FCC78FRACXX:1:1101:6639:75204'

FCC78FRACXX:1:1101:6639:75204#  163 chr2    74156623    255 23M2445N77M =   74159237    8677    CGCCTATCAATCAGATTAAACTCCTGAACAAAGAAAATAAAGTGCTTAAAGGAGGTGTTGAGGTGGGCCTCCTCTTGCAGCTGCATCACAGAATCAAAGT    bbbeeeeegggggiiiiihiiiihhiihiiiihiihiifhf]Yafaccfggfcgh^ceghhhi_ceghggggeeceedddcbccbccccc_bbcccc`c]    NH:i:1  HI:i:1  AS:i:199    NM:i:0  MD:Z:100

FCC78FRACXX:1:1101:6639:75204#  83  chr2    74159237    255 18M5963N82M =   74156623    -8677   TTTTGGGAAATGGGACACCAATCTTAGAAGGAAAAAGAGTTTCATCATCAAGCTGATCTTGAACCCAAGTCATCAAATAGTCAATGTATTTTGGTGCAGA    ^cccddddb`deeeedbbdggggagiiihiiihhiiiiihhhgchihhiihhhhhfiihihfciieiiiiihhiehihiiihiiiiigggggeeeeebbb    NH:i:1  HI:i:1  AS:i:199    NM:i:0  MD:Z:100

I filtered out the reads mapped to a particular region like this:

head genes_10000Flanks.bed
chr2    74109441        74156992        ACTG2

samtools view -b -L genes_10000Flanks.bed 7316-161-T_Aligned.out.sorted.bam -o 7316-161-T_Aligned.out.filtered_new.bam

But when I look at the filtered bam file I only get one mate:

samtools view 7316-161-T_Aligned.out.filtered_new.bam | grep 'FCC78FRACXX:1:1101:6639:75204'

FCC78FRACXX:1:1101:6639:75204#  163 chr2    74156623    255 23M2445N77M =   74159237    8677    CGCCTATCAATCAGATTAAACTCCTGAACAAAGAAAATAAAGTGCTTAAAGGAGGTGTTGAGGTGGGCCTCCTCTTGCAGCTGCATCACAGAATCAAAGT    bbbeeeeegggggiiiiihiiiihhiihiiiihiihiifhf]Yafaccfggfcgh^ceghhhi_ceghggggeeceedddcbccbccccc_bbcccc`c]    NH:i:1  HI:i:1  AS:i:199    NM:i:0  MD:Z:100

Why is the other mate not getting filtered?

rna-seq samtools • 875 views
ADD COMMENTlink modified 21 months ago • written 21 months ago by komal.rathi3.4k
1
gravatar for Pierre Lindenbaum
21 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

because the position 74159237 is out of the segment 74109441-74156992 and the L option doesn't work as you expect. If you want to retrieve all the reads, extends your bed or add a second step to retrieve all the reads by name. https://www.google.com/search?q=site%3Abiostars.org+read+names+bam

ADD COMMENTlink written 21 months ago by Pierre Lindenbaum121k

Thanks I just thought it would be simpler than this. I was hoping something like a partial match would work..

ADD REPLYlink written 21 months ago by komal.rathi3.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1571 users visited in the last hour