Question: bedtools genomecover fails to find coverage for the specific chr region
0
gravatar for IrK
2.1 years ago by
IrK30
Australia
IrK30 wrote:

Dear community,

I am trying to find coverage for the chr19 as "per-base". Instead I get coverage "per-base" for the whole genome, which is a time consuming operation. I have paired-end data which was filtered to proper aligned reads. The organism is mm10.

My command is as following: bedtools genomecov -d -ibam infile.bam -g only_chr19.genome > output_file.txt bedtools Version: v2.26.0

I found chr19 length for mm10 by using the command: curl -s ftp://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/chromInfo.txt.gz | gunzip -c Content of the only_chr19.genome file:

chr19    61431566

Content of BAM input file, which is sorted by position and also filtered to "read mapped in proper pair" with the following command
samtools view -b -F 256 -o out.filt.bam infile.bam

NS500:6295    99    chr1    3047349    1    16M    =    3047349    -16    AGGCACTGAAGAATTA    AAAAAEEEAEEEEEEE    AS:i:0    XS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:16    YS:i:0    YT:Z:CP
NS500:6295    147    chr1    3047349    1    16M    =    3047349    -16    AGGCACTGAAGAATTA    EEEEEEEEEEEAAAAA    AS:i:0    XS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:16    YS:i:0    YT:Z:CP
NS500:12119    99    chr1    3086977    32    22M    =    3086977    -22    AAGAGTGGATGCCTGAGCTTTC    AAAAAEEEEEEEEEEEEAEEEE    AS:i:0    XS:i:-5    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:22    YS:i:0    YT:Z:CP
NS500:18909    99    chr1    3086977    18    22M    =    3086977    -22    AAGAGAGGATGCCTGAGCTTTC    AAAAA//E/EEEE/E/EEE/EE    AS:i:-3    XS:i:-8    XN:i:0    XM:i:1    XO:i:0    XG:i:0    NM:i:1    MD:Z:5T16    YS:i:0    YT:Z:CP
NS500:12119    147    chr1    3086977    32    22M    =    3086977    -22    AAGAGTGGATGCCTGAGCTTTC    //AE/EEEEEE6AEEAE66AAA    AS:i:0    XS:i:-4    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:22    YS:i:0    YT:Z:CP

I can't understand how bedtools can produce coverage for the whole genome if the genome file contains only one line of the required information (chr19)? Another question I am having is, the manual says "-pc Calculates coverage for paired-end reads, coverage is calculated as the number of fragments covering each base pair" what is understood by fragments? Does it mean "first in pair R1" and "second in pair R2" read? Or does it take each read (R1 and R2) and count coverage for each of them, and not as pair-end reads? How could I found coverage considering both reads?

Thank you,

ADD COMMENTlink modified 2.1 years ago by Istvan Albert ♦♦ 81k • written 2.1 years ago by IrK30
1
gravatar for Istvan Albert
2.1 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

The purpose of the chromosome file is not to subselect intervals but to indicate how far out to report coverages even if you don't have alignments covering those regions. Hence it is reporting all coverages.

Slice your BAM file to the region of interest (samtools view -b alignment.bam chr19 > small.bam) and run genome coverage on that.

A fragment typically means the region between the leftmost aligned base of a read pair to the rightmost aligned base of its mate.

If you want to count the coverage of both pairs as reads and not fragments then don't use the -pc option

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Istvan Albert ♦♦ 81k

thank you so much, Istvan. It's very clear!

ADD REPLYlink written 2.1 years ago by IrK30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1638 users visited in the last hour