Question: BBMap Statistics Evaluation
0
gravatar for Ada
5 weeks ago by
Ada10
Ada10 wrote:

Hello, I would like assistance understanding the following results below:

  1. What is the fraction that "%unambiguousReads" is out of? Essentially, how do they come up with the fraction. What determines the denominator? In addition, if the fraction is 0.87 does this mean 87%? or 0.87%?

  2. What is the fraction that "%ambiguousReads" is out of? Essentially, how do they come up with the fraction. What determines the denominator?

  3. What does assignedReads mean?

  4. What does assignedBases mean?

  5. What does the MB mean in unambiguousMB/ambiguousMB?

    name    %unambiguousReads   unambiguousMB   %ambiguousReads ambiguousMB unambiguousReads    ambiguousReads  assignedReads   assignedBases
    NC_009801.1 Escherichia coli O139:H28 str. E24377A, complete sequence   0.87367 7.1124  4.07912 33.2073 47416   221382  111291  16693650
    NZ_GG773290.1 Escherichia coli MS 78-1 Scfld327, whole genome shotgun sequence  0.34082 2.77455 0.05644 0.45945 18497   3063    18747   2812050
    
ADD COMMENTlink modified 5 weeks ago by Istvan Albert ♦♦ 84k • written 5 weeks ago by Ada10

Since you are looking at two E.coli genomes it is not surprising that the % unambiguous reads is very small. No aligner is going to be able to distinguish between very similar genomes of the same species especially when short reads are being used. I am curious as to where the remaining 95% of reads are since they do not seem to be accounted for by these two lines.

ADD REPLYlink written 5 weeks ago by genomax87k
0
gravatar for Istvan Albert
5 weeks ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

I don't know what BBMap does specifically, but typically the denominator is the total number of reads, or the total number of mapped reads, depending on the circumstance.

In this case, it seems that the total number of reads was not reported in the statistics, hence we can't check that assumption.

I would expect that assigned means reads that the read could be mapped (assigned to a location).

I would expect that unambiguous means that a read maps to a single location.

I would expect that ambiguous means that a read maps equally well to more than one location.

The MB means megabase (millions of bases)

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Istvan Albert ♦♦ 84k
0
gravatar for genomax
5 weeks ago by
genomax87k
United States
genomax87k wrote:

Total number of reads is reported in BBMap/BBsplit stats. OP has not included that information. Typical result looks like this:

Genome:                 1
Key Length:             13
Max Indel:              20
Minimum Score Ratio:    0.56
Mapping Mode:           normal
Reads Used:             53236   (7985400 bases)

Mapping:                46.016 seconds.
Reads/sec:              1156.89
kBases/sec:             173.53


Pairing data:           pct pairs       num pairs       pct bases          num bases

mated pairs:             86.2236%           22951        86.2236%            6885300
bad pairs:                2.1527%             573         2.1527%             171900
insert size avg:          435.68


Read 1 data:            pct reads       num reads       pct bases          num bases

mapped:                  91.7499%           24422        91.7499%            3663300
unambiguous:             87.2943%           23236        87.2943%            3485400
ambiguous:                4.4556%            1186         4.4556%             177900
low-Q discards:           0.0000%               0         0.0000%                  0

In fact output posted by original poster is for bbsplit.sh refstats option. So these result needs to be taken into consideration with the main output of bbsplit.sh run which looks like the bbmap.sh I posted above (bbsplit.sh uses bbmap.sh under the covers to do the read binning). Example of that looks like:

#name   %unambiguousReads    unambiguousMB   %ambiguousReads ambiguousMB     unambiguousReads ambiguousReads   assignedReads   assignedBases
human   88.33496              7.053900           0.30806     0.024600         47026             164             47026          7053900
mouse   6.07108               0.484800           0.30806      0.024600         3232            164              3396            509400
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by genomax87k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 711 users visited in the last hour