Hi
bedtools is quite impressive. I could generate a data with 13 columns, when giving my gtf and sample.bam. I do get the coverage breadth, depth of reads, coordinates of the gene/other feature against which the coverage was calculated..... though i also wanted to get the region or i mean coordinates of read maximum covered.
Bedtools as per its documentation, gives the coverage by considering all the reads covered for a specific region of length Y and then taking the maximum read length (X), calculating Y/X. but it doesn't give us which region from that Y got abc coverage, ... which could be determined if coordinates were also given of that particular read, X.
Thanks
Please add some sort of toy data and expected output. Its's hard to guess what you need just from text.
sure
thanks for showing interest
this is how the data looks
here i can see, ....say for gene "DDX11L1" with gene_id ENSG00000223972, the total bases (the actual length of this gene) is 1661. Coordinates (start and end positions) are also given for this ref gene in col D and E. out of this 1661 bp, 1661 bps got covered, making sense that the longest read was also of same length as ref gene with same coordinates, thus giving an exact coverage of 1 or 100%.
But for genes/other features where mapped bases are less than the total bases, .... it is nearly impossible to say which part of ref gene got covered by our sequencing reads... (see second image please)
hope y query is clear now
in short, if i could get the coordinates of longest read that mapped to the ref gene (or any other region), then i would be able to know what exact region of ref gene or anything got 50 or 66 or 12 percent coverage breadth.