Hi!
I have been RNA seq short read sequencing data for a 112 dengue samples. I need to know by what percentage transcriptome is covered by our sequencing reads?
I found Bedtools as an appropriate tool for this. however, i am unable to understand two different outputs from this tool.. are these outputs comparable?
A command that gives depth wise coverage
bedtools coverage -split -reference.gtf -b sample_sorted.bam -s -hist
gives output in following cols, understood by documentation.
- chr; 2. source; 3. genomic feature; 4. start; 5. end; 6. score; 7. strand; 8. frame; 9. annotation; 10. depth; 11. mapped_bases; 12. total_bases (length of the genomic feature); 13. coverage %
here, i am seeing each gene having 0 depth and corresponding coverage... (not understood completely but i guess, in such particular rows, it is inferring the uncovered bases for that gene). is it so?
Another command that gives readcount wise coverage
bedtools coverage -split -reference.gtf -b sample_sorted.bam -s
gives output in following cols, understood by documentation.
- chr; 2. source; 3. genomic feature; 4. start; 5. end; 6. score; 7. strand; 8. frame; 9. annotation; 10. depth; 11. mapped_bases; 12. total_bases (length of the genomic feature); 13. coverage %
query1. Please refer image 1.
For a particular gene in a sampleA, i can only see coverage at 0, 1, 2 depths. coverage of this gene would be coverage at depth 1 plus coverage at depth 2 (a total of ~46%). But i have got 6 read counts in read count wise coverage output making that same coverage (~46%).
But i am unable to justify this. How come 6 read counts correspond to depth 2 (maximum)?. both commands have been taken into account: strand specificity, split function (to avoid counting overlapping read counts).
query2. by samtools, qualimap and feature counts subread, total reads are same for my samples. Referring to an article, feature counts takes all the overlapping reads also. That's why to get rid of overlapping read counts and for the genes for which coverage is being calculated, bedtools was used (above command 2nd). The values obtained from all tools differ from what obtained from bedtools. Can bedtools experts comment.. why so?
Regards
Do not use spreadsheet software to view plain text files. Use a plain text utility such as BBEdit/TextWrangler/NotePad++/gedit or simple command line utilities.
noted what about the query