Question

Read count vs Depth

0

Entering edit mode

11 months ago

Mary • 0

Hi!

I have been RNA seq short read sequencing data for a 112 dengue samples. I need to know by what percentage transcriptome is covered by our sequencing reads?

I found Bedtools as an appropriate tool for this. however, i am unable to understand two different outputs from this tool.. are these outputs comparable?

A command that gives depth wise coverage

bedtools coverage -split -reference.gtf -b sample_sorted.bam -s -hist gives output in following cols, understood by documentation.

chr; 2. source; 3. genomic feature; 4. start; 5. end; 6. score; 7. strand; 8. frame; 9. annotation; 10. depth; 11. mapped_bases; 12. total_bases (length of the genomic feature); 13. coverage %

here, i am seeing each gene having 0 depth and corresponding coverage... (not understood completely but i guess, in such particular rows, it is inferring the uncovered bases for that gene). is it so?

Another command that gives readcount wise coverage

bedtools coverage -split -reference.gtf -b sample_sorted.bam -s gives output in following cols, understood by documentation.

chr; 2. source; 3. genomic feature; 4. start; 5. end; 6. score; 7. strand; 8. frame; 9. annotation; 10. depth; 11. mapped_bases; 12. total_bases (length of the genomic feature); 13. coverage %

query1. Please refer image 1. #image_query1

For a particular gene in a sampleA, i can only see coverage at 0, 1, 2 depths. coverage of this gene would be coverage at depth 1 plus coverage at depth 2 (a total of ~46%). But i have got 6 read counts in read count wise coverage output making that same coverage (~46%).

But i am unable to justify this. How come 6 read counts correspond to depth 2 (maximum)?. both commands have been taken into account: strand specificity, split function (to avoid counting overlapping read counts).

query2. by samtools, qualimap and feature counts subread, total reads are same for my samples. Referring to an article, feature counts takes all the overlapping reads also. That's why to get rid of overlapping read counts and for the genes for which coverage is being calculated, bedtools was used (above command 2nd). The values obtained from all tools differ from what obtained from bedtools. Can bedtools experts comment.. why so?

Regards

Bedtools • 1.4k views

ADD COMMENT • link updated 11 months ago by Brian Bushnell 20k • written 11 months ago by Mary • 0

0

Entering edit mode

Do not use spreadsheet software to view plain text files. Use a plain text utility such as BBEdit/TextWrangler/NotePad++/gedit or simple command line utilities.

ADD REPLY • link 11 months ago by Ram 44k

0

Entering edit mode

noted what about the query

ADD REPLY • link 11 months ago by Mary • 0

score 1 · Answer 1 · 2023-12-07

1

Entering edit mode

11 months ago

Brian Bushnell 20k

Read count is the number of reads that map to a sequence. Depth is the average number of reads covering each reference base, often calculated by (#bp mapped to sequence)/(sequence length). There are a number of ways to calculate depth depending on things like what you do with indels and reads hanging off the ends of the reference but they normally end up similar. Percent coverage is a different matter (and also has varying definitions). Not to mention that you might want to know "what percent of my transcripts have any reads hitting them" versus "what percent of my transcriptome bases are covered by at least one read", which could both be loosely termed transcriptome coverage (as could average depth).

ADD COMMENT • link 11 months ago by Brian Bushnell 20k

0

Entering edit mode

Thanks Brain Bushnell

Referring to my query1: So u mean that for a particular gene, if total read count is 6, then depth may be just 1 or 2? How?