Question

Mapping reads to contigs with BBMap - output statistics question (metagenome)

0

Entering edit mode

3.1 years ago

bioknown • 0

Hi,

I am struggling to interpret the output from BBMap when mapping my reads back to my contigs from a metagenome assembly with MEGAHIT. Prior to assembly, my QC process has been to quality filter (Q20) and remove adaptors from paired end Illumina reads (2*150bp), followed by removal of human reads with BBMap (removehuman). Following this QC process I have assembled remaining reads into contigs using MEGAHIT, with 59051 contings (min 200bp, max 40333, avg 496bp, N50 460bp).

The sample I show below has the following reads from these QC steps:

Raw R1: 13,385,066 R2: 13,385,066
Post Quality and Adapter Trimming R1: 13,340,199 R2: 13,340,199
Post human contaminant removal R1: 5,917,329 R2: 5,917,329

Next I wanted to map my reads back to the contigs to determine coverage, and so used BBMap with the final.contigs.fa file as reference.

Set threads to 20
Retaining first best site only for ambiguous mappings.
Set genome to 1

Loaded Reference:       0.394 seconds.
Loading index for chunk 1-1, build 1
Generated Index:        0.868 seconds.
Analyzed Index:         2.398 seconds.
Started output stream:  0.046 seconds.
Cleared Memory:         0.597 seconds.
Processing reads in paired-ended mode.
Started read stream.
Started 20 mapping threads.
Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19

   ------------------   Results   ------------------   

Genome:                 1
Key Length:             13
Max Indel:              80
Minimum Score Ratio:    0.56
Mapping Mode:           normal
Reads Used:             11834658        (1591132584 bases)

Mapping:                117.737 seconds.
Reads/sec:              100517.41
kBases/sec:             13514.25


Pairing data:           pct pairs       num pairs       pct bases          num bases

mated pairs:             10.2855%          608629        10.1746%          161891954
bad pairs:                1.1129%           65853         1.1961%           19031352
insert size avg:          187.19


Read 1 data:            pct reads       num reads       pct bases          num bases

mapped:                  13.0861%          774347        13.0521%          104148353
unambiguous:             12.0305%          711884        12.0943%           96506377
ambiguous:                1.0556%           62463         0.9577%            7641976
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:        4.5870%          271430         4.5313%           36157212
semiperfect site:         4.8657%          287922         4.8237%           38490319
rescued:                  0.4807%           28443

Match Rate:                   NA               NA        95.6079%          100247767
Error Rate:              55.6636%          491289         3.5799%            3753684
Sub Rate:                55.2589%          487717         2.6549%            2783713
Del Rate:                 7.1384%           63004         0.6721%             704715
Ins Rate:                 5.9416%           52441         0.2530%             265256
N Rate:                   4.5790%           40414         0.8122%             851617


Read 2 data:            pct reads       num reads       pct bases          num bases

mapped:                  12.7039%          751731        12.6430%          100282958
unambiguous:             11.6445%          689043        11.6918%           92737494
ambiguous:                1.0594%           62688         0.9513%            7545464
low-Q discards:           0.0000%               1         0.0000%                148

perfect best site:        4.2838%          253485         4.2259%           33519037
semiperfect site:         4.5412%          268718         4.4939%           35645203
rescued:                  0.5159%           30528

Match Rate:                   NA               NA        95.4120%           96351226
Error Rate:              56.5154%          487302         3.7461%            3783025
Sub Rate:                56.1329%          484004         2.7885%            2815942
Del Rate:                 7.1383%           61550         0.6945%             701386
Ins Rate:                 5.8917%           50801         0.2631%             265697
N Rate:                   4.7009%           40533         0.8418%             850093

Total time:             122.119 seconds.

I am confused on how to interpret the BBMap output here. Is BBMap telling me that there are only 10% of my reads with a pair in the R2 file? And that only approx 13% reads from each paired file are being used in the assembly? Also why is the match rate NA for both reads?

Thanks in advance for any help with this.

Bio

Assembly alignment Metagenome • 992 views

ADD COMMENT • link 3.1 years ago by bioknown • 0

1

Entering edit mode

Yes, that seems to be accurate.

The problem is your very fragmented assembly. N50 460 bp is very low (think, half a typical bacterial gene). Long read - eg Promethion - assemblies might better suit your research goals.

Try fitting -for the average contig - 150 R1 -- NNNNx100 -150 bp R2 , so in total say about 400 bp onto 450bp. It will be challenging.

You can try another aligner, eg bwa mem, and use samtools stats on the subsequent BAM for a second opinion.

You could also try mapping R1 reads only too see if your hit rate increases, out of interest.

ADD REPLY • link 3.1 years ago by colindaven 6.4k

0

Entering edit mode

I concur with colindaven suggestion of trying to align the reads independently. You have a small number of reads that can be aligned as proper pairs.

ADD REPLY • link 3.1 years ago by GenoMax 141k

0

Entering edit mode

Thanks for your quick reply, I will give this a try and see what the results are!

ADD REPLY • link 3.1 years ago by bioknown • 0