Bam file depth of coverage and statstics
2
4
Entering edit mode
2.5 years ago
daewowo ▴ 80

I am looking for a tool that can provide a high level summary of coverage statistics of genomes used in an alignment.

I concatenated GCF_000002285.5_Dog10K_Boxer_Tasha_genomic.fna, GCF_000001405.39_GRCh38.p13_genomic.fna, and GCF_000001635.27_GRCm39_genomic.fna. Each of these animal genomes contains hundreds of chromosome sequences.

I have aligned RNASeq data to this combined dog-human-mouse reference and now I would like to calculate the total number of reads, coverage of each animal genome and depth of coverage. I am only interested in animal level summary, not chromosome coverage statstics.

Tools such as mosdepth, depth-cover, qualimap (V2.2.1) only provide coverage at a chromosome level, not a summary at full genome level (eg coverage 1x, 4x, 10x depths).

Update after a couple of days: Pehaps what I can do is get all the chromosome accessions for each of the genomes. Then extract from the .bam (forst convert to sam) result 3 seperate files containing only the results relevant to each genome. Then find a tool that will calculate summary statistics on each of the 3 sam files (total coverage at depth n, total number of reads mapped).

Bam • 2.7k views
ADD COMMENT
0
Entering edit mode

Why do you say that the Qualimap does not give you statistics at the genome level?

qualimap bamqc -c -bam input.bam

The above command gives you a multi-section report and the "Coverage" section shows you depth at the genome level

ADD REPLY
2
Entering edit mode

Maybe I am missing something - are you talking about that .html report output? The coverage section in that generates a single value for coverage

Coverage
Mean    0.0661
Standard Deviation  133.6669

plus a long table of chromosome stats and summary plots. Looking at the 'geneome_results.txt' file generates, I get also a 'Coverage' section, also with same single mean and std dev as above and a long list like this

There is a 0.03% of reference with a coverageData >= 1X

However it does not give me results per genome eg like this:

#Genome    DATA(%)   Avg depth      Median   Coverage%    Cov 4x %   Cov 10x %   Cov 30x %  Cov 100x %
human     90.00    28.41         23.0      99.42       96.91       83.98       34.05        2.58
mouse     5.00     28.41         23.0      99.42       96.91       83.98       34.05        2.58
dog   5.00     28.41         23.0      99.42       96.91       83.98       34.05        2.58
ADD REPLY
0
Entering edit mode

For multi-sample bam file you can use qualimap2 but unfortunately, I do not have information about the multi-reference bam file

ADD REPLY
0
Entering edit mode

I am using qualimap 2. I may be able to do this if I can generate GTF files for each genome and supply that to qualimap 2. Am trying to work out how to get GTF files

ADD REPLY
1
Entering edit mode

Why not use pre-prepared GTF files (per genome) on the NCBI?

For example, you can find that for dog assembled genome here

ADD REPLY
0
Entering edit mode

Pysam looks to be the way to go, counts per chromosome is easy but I need some time to spend to work out coverage per genome. Then just total counts for the full genome. No need to mess with GTF files

ADD REPLY
1
Entering edit mode
2.4 years ago
daewowo ▴ 80

Will use Pysam to write code to do this

ADD COMMENT
3
Entering edit mode
2.5 years ago
cfos4698 ★ 1.1k

Check out bedtools genomecov: https://bedtools.readthedocs.io/en/latest/content/tools/genomecov.html

As per their example:

bedtools genomecov -i A.bed -g my.genome
chr1   0  980  1000  0.98
chr1   1  20   1000  0.02
chr2   1  500  500   1
genome 0  980  1500  0.653333
genome 1  520  1500  0.346667
ADD COMMENT
0
Entering edit mode

I have 3 large genomes I am simultaneously aligning to: human, mouse, dog. The output of bedtools gives me alignments to 840,000 genomes. I am looking for a tool that can give me the summary at a mouse/human/dog level only, without showing alignment scores to individual chromosomes.

Both bamdst and bamstats can give %coverage at various read depths for human/mouse/dog but I cant use these as per above

ADD REPLY

Login before adding your answer.

Traffic: 3051 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6