bedtools genomecov fractions not summing to 1
0
0
Entering edit mode
2.9 years ago
maxrwjones ▴ 60

Hi all,

Confused by an output I got from bedtools genomecov. I was trying to identify what % of my target genome was covered by transcription factors so I submitted this command:

bedtools genomecov -i CS_unsplit_annotations/iwgsc_refseqv1.0_TransposableElements_2017Mar13.gff3 \
-g CS_chromosomes_no_mtcp.genome

-i = GFF file of transposable elements -g = genome file listing chromosome names and their coordinates

However, the output doesn't make much sense to me. The excerpt below describes what portions of the genome is covered to various depths by the TEs. However, these only sum to 0.70. What happened to the other 30% of the genome?

genome  0       2006041373      14547261565     0.137898
genome  1       4130680743      14547261565     0.283949
genome  2       821174927       14547261565     0.0564488
genome  3       2445960418      14547261565     0.168139
genome  4       690610368       14547261565     0.0474736
genome  5       132938036       14547261565     0.00913835
genome  6       20984529        14547261565     0.00144251
genome  7       2672582 14547261565     0.000183717
genome  8       1041252 14547261565     7.15772e-05
genome  9       19609   14547261565     1.34795e-06
genome  10      128569  14547261565     8.83802e-06
genome  12      41863   14547261565     2.87772e-06

Hoping someone can shed some light on this odd behaviour. Thank you!



EDIT

Getting even more confusing results with another feature file (coding sequences this time). Output states that about 90% of the length of individual chromosomes are not covered by any genes - this is roughly what I expect from my organism. However, for the genome as a whole, bedtools reports that only about 3% has no coverage... 21 chromosomes so not going to show all of the zero coverage lines, but here are some:

chr2D   0       592882880       651852609       0.909535 
chr3D   0       562541593       615552423       0.913881
chr6A   0       568492811        618079260       0.919773

And then the whole genome zero coverage line:

genome  0       401132100       14547261565     0.0275744

Note that field 3, which the manual describes as the "number of bases on chromosome (or genome) with depth equal to column 2", is smaller for the whole genome than it is for any of the individual chromosomes. What is going on?

Some other whole genome lines for context:

genome  1       580647461       14547261565     0.0399146
genome  2       279006216       14547261565     0.0191793
genome  3       148353592       14547261565     0.010198
genome  4       87959970        14547261565     0.0060465
genome  5       53422266        14547261565     0.00367232
genomecov bedtools • 428 views
ADD COMMENT

Login before adding your answer.

Traffic: 2620 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6