Question

bedtools: coordinates of read covered

0

Entering edit mode

6 months ago

Smriti ▴ 20

Hi

bedtools is quite impressive. I could generate a data with 13 columns, when giving my gtf and sample.bam. I do get the coverage breadth, depth of reads, coordinates of the gene/other feature against which the coverage was calculated..... though i also wanted to get the region or i mean coordinates of read maximum covered.

Bedtools as per its documentation, gives the coverage by considering all the reads covered for a specific region of length Y and then taking the maximum read length (X), calculating Y/X. but it doesn't give us which region from that Y got abc coverage, ... which could be determined if coordinates were also given of that particular read, X.

Thanks

bedtools • 590 views

ADD COMMENT • link updated 6 months ago by gonzlezb ▴ 10 • written 6 months ago by Smriti ▴ 20

1

Entering edit mode

Please add some sort of toy data and expected output. Its's hard to guess what you need just from text.

ADD REPLY • link 6 months ago by ATpoint 82k

0

Entering edit mode

sure

thanks for showing interest

this is how the data looks

bedtools_coverage

here i can see, ....say for gene "DDX11L1" with gene_id ENSG00000223972, the total bases (the actual length of this gene) is 1661. Coordinates (start and end positions) are also given for this ref gene in col D and E. out of this 1661 bp, 1661 bps got covered, making sense that the longest read was also of same length as ref gene with same coordinates, thus giving an exact coverage of 1 or 100%.

But for genes/other features where mapped bases are less than the total bases, .... it is nearly impossible to say which part of ref gene got covered by our sequencing reads... (see second image please)

coverage less than 1

hope y query is clear now

in short, if i could get the coordinates of longest read that mapped to the ref gene (or any other region), then i would be able to know what exact region of ref gene or anything got 50 or 66 or 12 percent coverage breadth.

ADD REPLY • link 6 months ago by Smriti ▴ 20

score 0 · Answer 1 · 2023-10-12

0

Entering edit mode

6 months ago

gonzlezb ▴ 10

If you want to know the exact nucleotides mapped against every single base in your ref genome, then you need to use samtools mpileup. You need: BAM file from alignment. Indexed FASTA file from your reference genome.

ADD COMMENT • link 6 months ago by gonzlezb ▴ 10