Bwa-Mem And Sam-Flag "Read Paired In Proper Pair"
1
1
Entering edit mode
11.0 years ago

I'm playing with BWA-MEM.

I've extracted a pair of reads properly paired (FLAG=2) with bwa-mem:

bwa mem human_g1k_v37.fasta NA12878_1.fastq.gz NA12878_2.fastq.gz | samtools view -S -f 3 -

SRR098401.2297993    99    20    123171    60    76M    =    123252    157    TGGCTGAGCCCAACCCCACCTAAGAAGGACAATAAAGATCTGTGTTCAGAGTCATACTGAATAGAGACTTCTGGAC    BEDFDGBEEFDBDDDGDEDGF@EFC?F?BCEGDDBEFEBFGHCH?BC@FCEADECDHFJC:B@D?HDG=C6A9==8    NM:i:0    AS:i:76    XS:i:0
SRR098401.2297993    147    20    123252    60    76M    =    123171    -157    AGAACCCACTGCCTCCTGATGAAGTCCCTACTGTTCACCCTTGCAGTTTTTATGCTCCTGGCCCAATTGGTCTCAG    <5>09>F:EBGDFDD?DDDEGDFDDF:HDCH8ACCGB:?EF>HGGGEF?DCCEHGC<GC@CCFFAAB>DC@D@EB>    NM:i:0    AS:i:76    XS:i:0

and I put the two reads in two fastq files:

@SRR098401.2297993
TGGCTGAGCCCAACCCCACCTAAGAAGGACAATAAAGATCTGTGTTCAGAGTCATACTGAATAGAGACTTCTGGAC
+
BEDFDGBEEFDBDDDGDEDGF@EFC?F?BCEGDDBEFEBFGHCH?BC@FCEADECDHFJC:B@D?HDG=C6A9==8

and

@SRR098401.2297993
AGAACCCACTGCCTCCTGATGAAGTCCCTACTGTTCACCCTTGCAGTTTTTATGCTCCTGGCCCAATTGGTCTCAG
+
<5>09>F:EBGDFDD?DDDEGDFDDF:HDCH8ACCGB:?EF>HGGGEF?DCCEHGC<GC@CCFFAAB>DC@D@EB>

now, If I re-align those reads with the ~same bwa-mem command

bwa mem  human_g1k_v37.fasta tmp1.fq tmp2.fq | grep -v -E "^@SQ" 2> /dev/null 

SRR098401.2297993    65    20    123171    60    76M    =    123252    82    TGGCTGAGCCCAACCCCACCTAAGAAGGACAATAAAGATCTGTGTTCAGAGTCATACTGAATAGAGACTTCTGGAC    BEDFDGBEEFDBDDDGDEDGF@EFC?F?BCEGDDBEFEBFGHCH?BC@FCEADECDHFJC:B@D?HDG=C6A9==8    NM:i:0    AS:i:76    XS:i:0
SRR098401.2297993    129    20    123252    60    76M    =    123171    -82    AGAACCCACTGCCTCCTGATGAAGTCCCTACTGTTCACCCTTGCAGTTTTTATGCTCCTGGCCCAATTGGTCTCAG    <5>09>F:EBGDFDD?DDDEGDFDDF:HDCH8ACCGB:?EF>HGGGEF?DCCEHGC<GC@CCFFAAB>DC@D@EB>    NM:i:0    AS:i:76    XS:i:0

The pair is no more flagged as "mapped in proper pair". Does bwa-mem needs a large number of pairs to set this flag ? how can I run some tests with a small amount of sequences ?

bwa • 7.6k views
ADD COMMENT
6
Entering edit mode
11.0 years ago
lh3 33k

bwa-mem needs to infer the insert size distribution from data. You have to mix the read pair with at least tens of other pairs. In addition, you cannot copy-paste the sequences in the SAM output to a fastq as you need to reverse complement one of them.

ADD COMMENT
0
Entering edit mode

Yes I also tried with the revcomp sequences. Thank you for your answer Heng.

ADD REPLY

Login before adding your answer.

Traffic: 2104 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6