Question: bbmap paired-end alignment problem
0
gravatar for mark.rose
2.7 years ago by
mark.rose40
United States
mark.rose40 wrote:

Hi All

I'm having a problem with aligning Illumina PE reads with BBMap version 35.92.

Execution of

bbmap.sh overwrite=t ref=Z.fasta in=R1_001_val_1.100.fq in2=R2_001_val_2.100.fq minid=0.25 mappedonly=true out=out.sam  outu=unaligned_reads.fa maxindel=100000

yields out.sam which contains only alignments of read1 even though the log info seems to suggest read2 reads were also aligned

Pairing data:           pct reads       num reads       pct bases          num bases

mated pairs:            100.0000%             100       107.4763%              59558
bad pairs:                0.0000%               0         0.0000%                  0
insert size avg:          412.82


Read 1 data:            pct reads       num reads       pct bases          num bases

mapped:                 100.0000%             100       100.0000%              29779
unambiguous:            100.0000%             100       100.0000%              29779
ambiguous:                0.0000%               0         0.0000%                  0
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:        0.0000%               0         0.0000%                  0
semiperfect site:        35.0000%              35        35.1758%              10475
rescued:                  0.0000%               0

Match Rate:                   NA               NA        88.8176%              26449
Error Rate:              65.0000%              65         4.4763%               1333
Sub Rate:                65.0000%              65         0.3627%                108
Del Rate:                 0.0000%               0         0.0000%                  0
Ins Rate:                49.0000%              49         4.1136%               1225
N Rate:                 100.0000%             100         6.7061%               1997


Read 2 data:            pct reads       num reads       pct bases          num bases

mapped:                 100.0000%             100       100.0000%              25636
unambiguous:            100.0000%             100       100.0000%              25636
ambiguous:                0.0000%               0         0.0000%                  0
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:        0.0000%               0         0.0000%                  0
semiperfect site:        26.0000%              26        25.3745%               6505
rescued:                  2.0000%               2

Match Rate:                   NA               NA        86.8232%              22258
Error Rate:              74.0000%              74         4.9852%               1278
Sub Rate:                74.0000%              74         0.8894%                228
Del Rate:                 0.0000%               0         0.0000%                  0
Ins Rate:                42.0000%              42         4.0958%               1050
N Rate:                 100.0000%             100         8.1916%               2100

As a test I then tried

bbmap.sh overwrite=t ref=Z.fasta in=R2_001_val_2.100.fq minid=0.25 mappedonly=true out=out.sam  outu=unaligned_reads.fa maxindel=100000

Here read2 reads were aligned.

Any ideas what is wrong with the first command line for the paired-end reads?

Thanks

Mark

align bbmap paired-end • 1.4k views
ADD COMMENTlink written 2.7 years ago by mark.rose40

What is the output of:

samtools view -f 0x0040 out.sam | wc -l

and

samtools view -f 0x0080 out.sam | wc -l
ADD REPLYlink written 2.7 years ago by h.mon31k
[rosema1@demeter trunk]$ samtools view -f 0x0040 out.sam | wc -l
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "out.sam".
0
[rosema1@demeter trunk]$ samtools view -f 0x0080 out.sam | wc -l
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "out.sam".
0

I also tried

samtools view -o out.bam out.sam
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "out.sam".

I just noticed that the sam file has as many lines as it should given paired-end reads and that there are 2 lines for every read ID but they are both labeled, for instance, "M00645:16:000000000-B666D:1:1101:18619:5889 1:N:0:48", though they are different sequences. Going back to the fastq files I find the sequence for read1 in the read1 fastq file ( with, of course the ID above). When I look for read2 in the read2 file, I find the base ID above and the sequence is as it is in the sam file except it is reverse complemented.

ADD REPLYlink modified 2.7 years ago by genomax92k • written 2.7 years ago by mark.rose40

It is as I suspected, bbmap removes the trailing /1 and /2, but keeps track of first read and second read using sam flags. However, I don't know why the samtools command is failing, which version of samtools are you using?

ADD REPLYlink written 2.7 years ago by h.mon31k

I checked the FLAG and you are indeed correct, one is read1 and the other read2 despite their QNAMEs. So I guess all is fine, if unexpected, in regard to the representation of paired-ends. Thanks for your help.

As for samtools, I'm using version 0.1.19-44428cd.

ADD REPLYlink written 2.7 years ago by mark.rose40

You really need to upgrade to the latest samtools (now v.1.7). That version is ancient and can only lead to other troubles.

ADD REPLYlink written 2.7 years ago by genomax92k

mark.rose : If you have samtools available in your $PATH you can directly create bam files by specifying out=myfile.bam when running bbmap.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by genomax92k

Thanks for the tip

ADD REPLYlink written 2.7 years ago by mark.rose40

Not that it matters for the original question, but with .sam files, it should be samtools view -S -f 0x0040 out.sam, or has that changed in a recent version?

ADD REPLYlink written 2.7 years ago by cschu1812.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1032 users visited in the last hour