Question

Very low average coverage obtained

0

Entering edit mode

9 months ago

mail2steff ▴ 70

Dear All,

I got the viral genome assembly using metaviral spade. Since this is a relatively new genome, I don't have any closest proper reference. So I mapped my reads back to contigs to know the coverage using bbmap and got the following output:

java -ea -Xmx21819m -Xms21819m -cp /Tools/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 ref=/7_new/contigs.fasta in=200_forward_paired_vulgatus_clean.fastq in2=200_reverse_paired_vulgatus_clean.fastq covstats=constats_AvS7_all.txt covhist=covhist_AvS7_all.txt basecov=basecov_AvS7_all.txt bincov=bincov_AvS7_all.txt t=200
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, ref=/7_new/contigs.fasta, in=200_forward_paired_vulgatus_clean.fastq, in2=200_reverse_paired_vulgatus_clean.fastq, covstats=constats_AvS7_all.txt, covhist=covhist_AvS7_all.txt, basecov=basecov_AvS7_all.txt, bincov=bincov_AvS7_all.txt, t=200]
Version 39.01

Set threads to 200
Retaining first best site only for ambiguous mappings.
No output file.
NOTE:   Deleting contents of ref/genome/1 because reference is specified and overwrite=true
NOTE:   Deleting contents of ref/index/1 because reference is specified and overwrite=true
Writing reference.
Executing dna.FastaToChromArrays2 [/7_new/contigs.fasta, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]

Set genScaffoldInfo=true
Writing chunk 1
Set genome to 1

Loaded Reference:       0.004 seconds.
Loading index for chunk 1-1, build 1
No index available; generating from reference genome: 7/ref/index/1/chr1_index_k13_c13_b1.block
Indexing threads started for block 0-1
Indexing threads finished for block 0-1
Generated Index:        1.765 seconds.
Analyzed Index:         2.695 seconds.
Cleared Memory:         0.347 seconds.
Processing reads in paired-ended mode.
Started read stream.
Started 200 mapping threads.
Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199

   ------------------   Results   ------------------

Genome:                 1
Key Length:             13
Max Indel:              16000
Minimum Score Ratio:    0.56
Mapping Mode:           normal
Reads Used:             9360574 (1409409714 bases)

Mapping:                46.475 seconds.
Reads/sec:              201409.01
kBases/sec:             30325.90


Pairing data:           pct pairs       num pairs       pct bases          num bases

mated pairs:              5.3164%          248823         5.3267%           75074960
bad pairs:                0.0528%            2471         0.0526%             740736
insert size avg:          305.92


Read 1 data:            pct reads       num reads       pct bases          num bases

mapped:                   5.4125%          253319         5.4169%           38209925
unambiguous:              5.4124%          253316         5.4169%           38209476
ambiguous:                0.0001%               3         0.0001%                449
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:        4.2656%          199644         4.2721%           30134879
semiperfect site:         4.2671%          199711         4.2735%           30144721
rescued:                  0.1074%            5025

Match Rate:                   NA               NA        98.1200%           37791543
Error Rate:              20.7722%           52620         1.8701%             720264
Sub Rate:                20.6234%           52243         0.7863%             302859
Del Rate:                 1.6079%            4073         0.7938%             305730
Ins Rate:                 3.2378%            8202         0.2899%             111675
N Rate:                   0.6245%            1582         0.0100%               3848


Read 2 data:            pct reads       num reads       pct bases          num bases

mapped:                   5.3925%          252385         5.4043%           38047930
unambiguous:              5.3925%          252383         5.4043%           38047631
ambiguous:                0.0000%               2         0.0000%                299
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:        3.7263%          174402         3.7388%           26322053
semiperfect site:         3.7277%          174465         3.7400%           26331016
rescued:                  0.1078%            5046

Match Rate:                   NA               NA        92.4335%           37564145
Error Rate:              30.7558%           77623         7.5557%            3070576
Sub Rate:                30.5129%           77010         0.9457%             384336
Del Rate:                 1.6744%            4226         6.3761%            2591184
Ins Rate:                 2.9796%            7520         0.2339%              95056
N Rate:                   0.3598%             908         0.0108%               4393

Reads:                                  9360574
Mapped reads:                           505703
Mapped bases:                           76763410
Ref scaffolds:                          3
Ref bases:                              124837

Percent mapped:                         5.402
Percent proper pairs:                   5.316
Average coverage:                       614.909
Average coverage with deletions:        632.237
Standard deviation:                     952.759
Percent scaffolds with any coverage:    100.00
Percent of reference bases covered:     100.00

Total time:             51.659 seconds.

The overall mapping is only 5% and but the average coverage is 614. For other samples also, I got the very low Percent mapped and high coverage. Does this mean the assembly is not good?. or how do I improve the results? Thank you in advance

viral-genomes denovo-assembly spades • 800 views

ADD COMMENT • link 9 months ago by mail2steff ▴ 70

score 4 · Accepted Answer · 2023-07-17

4

Entering edit mode

9 months ago

colindaven 6.4k

A coverage of 614 is very high, I would have thought you'd be happy with that ?

The rest of the reads are likely from the host organism in your viral culture (you can map to this too to test, or to both the virus and host together after combining the reference fastas with cat x y > both.fa

You can create and viz a bam file using a tool like IGV to check the virus de novo assembly has been completely covered. Is the virus in multiple contigs or one ? Are all important genes present in the assembly ?

ADD COMMENT • link 9 months ago by colindaven 6.4k

1

Entering edit mode

Thank you for the reply. I was sceptical since the read mapping is only 5.402% even though the coverage is high. I filtered the host genome before assembly. So there is no presence of the host.

It has three contigs.

Contigs     Read mapping Average coverage
Contig1     5.402       614.909   
Contig2     0.199   58.451
Contig3     0.192   68.108

But our Study was a single viral isolate procedure and we expected only one viral species. Would it be ok even if I get the low read mapping percentage with high coverage?. What could be the reasons with very low read mapping In the above sample, can I consider Contig 1 as the true positive one?

ADD REPLY • link 9 months ago by mail2steff ▴ 70

1

Entering edit mode

Contig 1 - possibly, probably. But you're probably a virologist so know much more than us bioinformaticians which contig and genes are necessary for your virus.

The rest of the reads are, as I said, likely some form of contamination I'd assume (your isolate is not pure). Have a look at metagenomics tools for read alignment, centrifuge or kraken are easy enough.

Alternatively, your de novo assembly might not be complete (eg only 10% of the genome has been constructed).