Question

How to find out read counts from bedtools coverage?

0

Entering edit mode

5.9 years ago

ganeshram • 0

Hi! I need help its great appreciated you I did bedtools coverage to find out read counts with my bamfile and bed file (-abam to -b) and I got output also but its very much confusing for me to select read counts from the number of columns. I got output as Chr_name, Chr_Start, Chr_End, one score column, Strand, after this four coluns with valuse then gene_id and gene_name, gene_discriptions here I attached my file...

chr17   42148455    42200874    255 +   3   138  8768   52419   0.1672676   uc002ifd.1  HDAC5   Homo sapiens histone deacetylase 5 (HDAC5), transcript variant 1, mRNA.

chr10   88420620    88463684    255 +   2   135  8460   43064   0.1964518   uc001kdr.3  LDB3    Homo sapiens LIM domain binding 3 (LDB3), transcript variant 4, mRNA.

chr1    28824488    28870383    255 +   2     117   7581    45895   0.1651814   uc001bqb.2  RCC1    Homo sapiens regulator of chromosome condensation 1 (RCC1), transcript variant 6, non-coding RNA.

chr17   77018764    77064804    255 -   2   111 7285    46040   0.1582320   uc031rep.1  C1QTNF1 Homo sapiens C1q and tumor necrosis factor related protein 1 (C1QTNF1), transcript variant 3, mRNA.

chr12   110146748   110192188   255 +   2   108 7026    45440   0.1546215   uc001tpd.2  FAM222A Homo sapiens family with sequence similarity 222, member A (FAM222A), mRNA.

I did overlapping studies So which one I can considered as a read counts? till how much read count is significant?

HITs-CLIP • 1.4k views

ADD COMMENT • link updated 5.9 years ago by Benn 8.4k • written 5.9 years ago by ganeshram • 0

0

Entering edit mode

It would be helpful to see the actual command that was run so we could see which order the files were used (which was -a and which was -b) and what flags were raised. It looks like you ran the command backwards, coverageBed counts regions from the -b file that overlaps with regions from the -a, so you would want your annotation file to be -a and your bam file to be -b.

Regarding which column has the read counts, from the coverageBed --help output:

Default Output:  
     After each entry in A, reports: 
       1) The number of features in B that overlapped the A interval.
       2) The number of bases in A that had non-zero coverage.
       3) The length of the entry in A.
       4) The fraction of bases in A that had non-zero coverage.

Significance and peak finding would require more than simply running coverageBed, you should look into HITS-CLIP pipelines and research what tools are available and which are prevalent in the literature a bit before moving forward.

ADD REPLY • link 5.9 years ago by shawn.w.foley ★ 1.3k

0

Entering edit mode

Thanks for replying

This is a unique bed file for this I want a read counts for each line.

chr1 16182 16257 NS500223:271:H5M25BGX3:1:13210:25140:3872#15 18 + 16182 16257 0 1 75 0 chr1 79506 79581 NS500223:273:H7CLFBGX3:1:13308:18261:15667#2 3 + 79506 79581 0 1 75 0 chr1 83735 83809 NS500223:271:H5M25BGX3:1:12108:12041:6078#8 9 + 83735 83809 0 1 74 0 chr1 106742 106817 NS500223:271:H5M25BGX3:1:21306:22791:2729#1 1 + 106742 106817 0 1 75 0 chr1 134992 135066 NS500223:271:H5M25BGX3:2:13307:16432:15372#3 4 + 134992 135066 0 1 74 0

For the above bed file, I need read count for each region using BAM file. For example, what would be the read count for chr1 139244-139319 (the first 3 columns of above bed file). I used the following command.

bedtools coverage -a test.bed -b reads.sorted.bam and I got out put like:

chr1 139244 139319 NS500223:271:H5M25BGX3:1:12202:9314:9130#2 3 + 139244 139319 0 1 75 0 5 75 75 1.0000000 chr1 139248 139323 NS500223:271:H5M25BGX3:2:12211:14307:17642#2 3 + 139248 139323 0 1 75 0 5 75 75 1.0000000 chr1 231493 231568 NS500223:273:H7CLFBGX3:2:11102:1172:19666#2 2 + 231493 231568 0 1 75 0 3 75 75 1.0000000 chr1 231502 231577 NS500223:271:H5M25BGX3:3:23607:10465:8405#1 1 + 231502 231577 0 1 75 0 3 75 75 1.0000000 chr1 532012 532087 NS500223:271:H5M25BGX3:3:23403:8758:8466#1 2 + 532012 532087 0 1 75 0 4 75 75 1.0000000 chr1 534712 534786 NS500223:271:H5M25BGX3:1:11304:5713:17232#8 8 + 534712 534786 0 1 74 0 1 74 74 1.0000000 chr1 538864 538939 NS500223:271:H5M25BGX3:1:22204:20184:9865#5 6 + 538864 538939 0 1 75 0 2 75 75 1.0000000 chr1 540585 540660 NS500223:271:H5M25BGX3:3:21403:24009:4818#2 2 + 540585 540660 0 1 75 0 1 75 75 1.0000000

How can I interpret these results? Which column is the read count?

ADD REPLY • link 5.9 years ago by ganeshram • 0

0

Entering edit mode

Using the "code sample" formatting button (the 101010 button) will make reading your post a bit easier.

It looks like for the first line of your bed file test.bed is:

chr1 16182 16257 NS500223:271:H5M25BGX3:1:13210:25140:3872#15 18 + 16182 16257 0 1 75 0

and the output from coverageBed is:

chr1 139244 139319 NS500223:271:H5M25BGX3:1:12202:9314:9130#2 3 + 139244 139319 0 1 75 0 5 75 75 1.0000000

From the coverageBed --help we know that this tool appends 4 columns to the original .bed file and they correspond to:

   1) The number of features in B that overlapped the A interval.
   2) The number of bases in A that had non-zero coverage.
   3) The length of the entry in A.
   4) The fraction of bases in A that had non-zero coverage.

So the number of reads from the bam file that overlapped with the first interval in the bed file is 5.

ADD REPLY • link 5.9 years ago by shawn.w.foley ★ 1.3k