Question: How To Extract Reads With Wrong Insert Size With Samtools
0
gravatar for Assa Yeroslaviz
6.1 years ago by
Assa Yeroslaviz1.1k
Munich
Assa Yeroslaviz1.1k wrote:

Hi all,

I would like to know how to extract paired reads from a bam file with the "wrong" insert size. I am extracting the reads with the "right" size with this command:

samtools view G_AGTCAA.bam | gawk ' and($2,0x0002) && and($2,0x0040)' | awk '{print $1,"\t", $2, "\t", $4,"\t" $8,"\t" $9}' |sed 's/-//g' > G_insert_size.txt

This command gives me the reads with the "correct" insert size

HWIST863:138:D0WT7ACXX:5:1101:1110:56612      15090     15326     320
HWIST863:138:D0WT7ACXX:5:1101:1168:81336      5294     5061     334
HWIST863:138:D0WT7ACXX:5:1101:1169:91247      15799     15625     275
HWIST863:138:D0WT7ACXX:5:1101:1207:97805      8616     8704     189

But what I would like to get are the reads which have the FLAGS 97 or 145, like these two:

 HWI-ST863:138:D0WT7ACXX:5:2105:15878:99962    145    MT1    4599    37    101M    =    16312    11612    CAATCTCACTTCTATGAAATAAAACTCCAGCAATACTAACTATAATCTCACTGATATTACTATCCCTAGGAGGCCTTCCACCACTAACAGGATTCTTACCA    @CDCCACAEEAFEFFFEDHHHHGHIHJIGFGCA@/HCIHJJJIIJJJIHF?FDGD?HG?IHFCJJJJJJJJJIJJIHEGAGIFGHHGJHHHHHFFFDDCC@    XT:A:U    NM:i:0    SM:i:37    AM:i:0    X0:i:1    X1:i:0    XM:i:0    XO:i:0    XG:i:0    MD:Z:101
 HWI-ST863:138:D0WT7ACXX:5:2105:15878:99962    97    MT1    16312    0    101M    =    4599      -11612    TAATAACAAAGCAAAGCACTGAAAATGCTTAGATGGATAATTGTATCCCATAAACACAAAGGTTTGGTCCTGGCCTTATAATTAATTAGAGGTAAAATTAC    CCCFFFFFHHHHHIJJJJJJJJJJJJJIJJJJJJJJJJJJJJJFHIHIJIJJIJJIHIJJJIFGIIIGGIIGIHHEHHGECEDEEFEDECEDCAC>AD@C@    XT:A:R    NM:i:0    SM:i:0    AM:i:0    X0:i:2    X1:i:0    XM:i:0    XO:i:0    XG:i:0    MD:Z:101    XA:Z:MT1,+13,101M,0;

this pair has an insert size of 11612.

I would like to know what hex strings I need to use to find this reads?

Thanks

Assa

bam samtools awk • 6.2k views
ADD COMMENTlink modified 6.1 years ago by swbarnes24.5k • written 6.1 years ago by Assa Yeroslaviz1.1k
2
gravatar for matted
6.1 years ago by
matted6.9k
Boston, United States
matted6.9k wrote:

I'm not exactly sure what class of reads you're looking for, but spend some quality time with the SAMtools explain flags page.

Piping through gawk and manually testing the flags is crazy and totally misses the entire point of them. It's slow and error-prone. You should be doing something like:

samtools view -f66 in.bam | cut -f 1,2,4,8,9 | tr -d "-" > out.txt

This matches your original output (up to whitespace differences) and uses the samtools flag filtering.

Once you figure out exactly what you want, use the -f and -F flags to samtools view to extract the correct reads.

For example, if you want reads that were paired and both mapped, but not properly, you'd do:

samtools view -f1 -F14 in.bam.

Reads the samtools docs and the flag explainer page to verify this snippet.

ADD COMMENTlink written 6.1 years ago by matted6.9k

I thought proper pair means the the insert size is within the "correct" size. But I find it very confusing to choose the right parameters for the correct FLAG. But i'll look into it. thanks

ADD REPLYlink written 6.1 years ago by Assa Yeroslaviz1.1k
0
gravatar for swbarnes2
6.1 years ago by
swbarnes24.5k
United States
swbarnes24.5k wrote:

The other way to do this is to use awk to screen the contents of the insert size column, and only keep the reads where the absolute value of the insert size is outside your desired parameters.

ADD COMMENTlink written 6.1 years ago by swbarnes24.5k

and how do i do that?

ADD REPLYlink written 6.1 years ago by Assa Yeroslaviz1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1734 users visited in the last hour