How to find out read counts from bedtools coverage?
0
0
Entering edit mode
4.7 years ago
ganeshram • 0

Hi! I need help its great appreciated you I did bedtools coverage to find out read counts with my bamfile and bed file (-abam to -b) and I got output also but its very much confusing for me to select read counts from the number of columns. I got output as Chr_name, Chr_Start, Chr_End, one score column, Strand, after this four coluns with valuse then gene_id and gene_name, gene_discriptions here I attached my file...

chr17   42148455    42200874    255 +   3   138  8768   52419   0.1672676   uc002ifd.1  HDAC5   Homo sapiens histone deacetylase 5 (HDAC5), transcript variant 1, mRNA.

chr10   88420620    88463684    255 +   2   135  8460   43064   0.1964518   uc001kdr.3  LDB3    Homo sapiens LIM domain binding 3 (LDB3), transcript variant 4, mRNA.

chr1    28824488    28870383    255 +   2     117   7581    45895   0.1651814   uc001bqb.2  RCC1    Homo sapiens regulator of chromosome condensation 1 (RCC1), transcript variant 6, non-coding RNA.

chr17   77018764    77064804    255 -   2   111 7285    46040   0.1582320   uc031rep.1  C1QTNF1 Homo sapiens C1q and tumor necrosis factor related protein 1 (C1QTNF1), transcript variant 3, mRNA.

chr12   110146748   110192188   255 +   2   108 7026    45440   0.1546215   uc001tpd.2  FAM222A Homo sapiens family with sequence similarity 222, member A (FAM222A), mRNA.

I did overlapping studies So which one I can considered as a read counts? till how much read count is significant?

HITs-CLIP • 967 views
ADD COMMENT
0
Entering edit mode

It would be helpful to see the actual command that was run so we could see which order the files were used (which was -a and which was -b) and what flags were raised. It looks like you ran the command backwards, coverageBed counts regions from the -b file that overlaps with regions from the -a, so you would want your annotation file to be -a and your bam file to be -b.

Regarding which column has the read counts, from the coverageBed --help output:

Default Output:  
     After each entry in A, reports: 
       1) The number of features in B that overlapped the A interval.
       2) The number of bases in A that had non-zero coverage.
       3) The length of the entry in A.
       4) The fraction of bases in A that had non-zero coverage.

Significance and peak finding would require more than simply running coverageBed, you should look into HITS-CLIP pipelines and research what tools are available and which are prevalent in the literature a bit before moving forward.

ADD REPLY
0
Entering edit mode

Thanks for replying

This is a unique bed file for this I want a read counts for each line.

chr1 16182 16257 NS500223:271:H5M25BGX3:1:13210:25140:3872#15 18 + 16182 16257 0 1 75 0 chr1 79506 79581 NS500223:273:H7CLFBGX3:1:13308:18261:15667#2 3 + 79506 79581 0 1 75 0 chr1 83735 83809 NS500223:271:H5M25BGX3:1:12108:12041:6078#8 9 + 83735 83809 0 1 74 0 chr1 106742 106817 NS500223:271:H5M25BGX3:1:21306:22791:2729#1 1 + 106742 106817 0 1 75 0 chr1 134992 135066 NS500223:271:H5M25BGX3:2:13307:16432:15372#3 4 + 134992 135066 0 1 74 0

For the above bed file, I need read count for each region using BAM file. For example, what would be the read count for chr1 139244-139319 (the first 3 columns of above bed file). I used the following command.

bedtools coverage -a test.bed -b reads.sorted.bam and I got out put like:

chr1 139244 139319 NS500223:271:H5M25BGX3:1:12202:9314:9130#2 3 + 139244 139319 0 1 75 0 5 75 75 1.0000000 chr1 139248 139323 NS500223:271:H5M25BGX3:2:12211:14307:17642#2 3 + 139248 139323 0 1 75 0 5 75 75 1.0000000 chr1 231493 231568 NS500223:273:H7CLFBGX3:2:11102:1172:19666#2 2 + 231493 231568 0 1 75 0 3 75 75 1.0000000 chr1 231502 231577 NS500223:271:H5M25BGX3:3:23607:10465:8405#1 1 + 231502 231577 0 1 75 0 3 75 75 1.0000000 chr1 532012 532087 NS500223:271:H5M25BGX3:3:23403:8758:8466#1 2 + 532012 532087 0 1 75 0 4 75 75 1.0000000 chr1 534712 534786 NS500223:271:H5M25BGX3:1:11304:5713:17232#8 8 + 534712 534786 0 1 74 0 1 74 74 1.0000000 chr1 538864 538939 NS500223:271:H5M25BGX3:1:22204:20184:9865#5 6 + 538864 538939 0 1 75 0 2 75 75 1.0000000 chr1 540585 540660 NS500223:271:H5M25BGX3:3:21403:24009:4818#2 2 + 540585 540660 0 1 75 0 1 75 75 1.0000000

How can I interpret these results? Which column is the read count?

ADD REPLY
0
Entering edit mode

Using the "code sample" formatting button (the 101010 button) will make reading your post a bit easier.

It looks like for the first line of your bed file test.bed is:

chr1 16182 16257 NS500223:271:H5M25BGX3:1:13210:25140:3872#15 18 + 16182 16257 0 1 75 0

and the output from coverageBed is:

chr1 139244 139319 NS500223:271:H5M25BGX3:1:12202:9314:9130#2 3 + 139244 139319 0 1 75 0 5 75 75 1.0000000

From the coverageBed --help we know that this tool appends 4 columns to the original .bed file and they correspond to:

   1) The number of features in B that overlapped the A interval.
   2) The number of bases in A that had non-zero coverage.
   3) The length of the entry in A.
   4) The fraction of bases in A that had non-zero coverage.

So the number of reads from the bam file that overlapped with the first interval in the bed file is 5.

ADD REPLY

Login before adding your answer.

Traffic: 2782 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6