Question: coverageBed problem: bam contains chromosome while query file does not
2
gravatar for Chico Fernandez
5.0 years ago by
Chico Fernandez70 wrote:

UPDATE: Partly solved. It works without sorted option and now I do not need this option. However, the original question remains the same - how to calculate coverage from .bam file using -sorted option. I added dummy region from MT to the .bed file - it did not work.

Have a problem with coverageBed. Run command:

covBed -d -sorted -b Sample.realigned.dm.recalibrated.bam -a srt.bed

get output: 

ERROR: Database file Sample.realigned.dm.recalibrated.bam contains chromosome , but the query file does not.

       Please re-reun with the -g option for a genome file.

       See documentation for details.

How to solve it? (coverageBed with -g option ? - was not really helpful). It works without -sorted option.

My bed file chromosomes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
Y

Bam file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
Y
MT
coverage bedtools • 2.7k views
ADD COMMENTlink modified 5.0 years ago by dariober11k • written 5.0 years ago by Chico Fernandez70
2

Upvote for providing data with your question this time :)

Also, please be aware that when a tool doesn't give an error, that doesn't mean it worked. It is your job as the analyst to make sure all your coordinates (BAM and BED) are exactly the same, otherwise they'll be a serious and non-obvious error in your data.
Going back to the map analogy, if I asked you to go to position "51.5072°N, 0.1275°W", and you used the Greenwhich Meridian, you'd end up in London. If however you used any of the other zero meridians (http://www.worldheritagesite.org/tags/tag433.html) you'd end up somewhere else - probably in the middle of an ocean. Make sure your coordinates match up, ideally by using the same reference file throughout.

ADD REPLYlink written 5.0 years ago by John12k

I agree. But providing coordinates and using the wrong meridian should give you some answer. The tool should not crash when meridian is not Greenwhich. Tool should not care about it. It should make several steps starting from 0 and give result (right or wrong). In my case it does not do it.

ADD REPLYlink written 5.0 years ago by Chico Fernandez70
1

John would probably argue that it's better to have a tool that doesn't drop you in the middle of the ocean when your intended destination is London...

Seriously, we're trying to share our experience and help you avoid an all-too-common error. Complaining that the tool 'should give you some answer' is counterproductive, and foolhardy if the answer is wrong. The tool is obviously designed to check if the maps agree, and they don't. It even provides a useful error message and recommended solution, which is better than a lot of software. If the tool doesn't perform the way that you think it should, then you're welcome to develop your own.

ADD REPLYlink written 5.0 years ago by harold.smith.tarheel4.6k

Thank you for the comments, I really appreciate your help. I am a tools' developer, the problem is that everybody was complaining on me that I re-invent wheels all the time, that's why I decided to use third-party tools where it is possible. As a developer I can not imagine a mechanism of checking if 2 references are the same for bam file and bed file, can you? Moreover, the coordinate system was the same, the reference was the same (look at the answer by dariober) and the tool worked well without sorted option (all probes, like 5% of the genome, was well-covered, and it could not happen with the wrong map). The problem was not in wrong coordinate system, the problem was "to explain tool that it should not complain and just do his job".

ADD REPLYlink written 5.0 years ago by Chico Fernandez70
3
gravatar for dariober
5.0 years ago by
dariober11k
WCIP | Glasgow | UK
dariober11k wrote:

Have you actually tried to pass a genome file to coverageBed? As I remember, coverageBed will complain even if chrom names are the same and in the same order but one chrom is missing from one of the two inputs, as for example chrom MT in your case. You can obtain the genome file from UCSC with e.g:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \
    "select chrom, size from hg19.chromInfo" > hg19.genome

Or from the bam file header with:

samtools view -H aln.bam \
| grep -P "@SQ\tSN:" \
| sed 's/@SQ\tSN://' \
| sed 's/\tLN:/\t/' > genome.txt

Then run coverageBed -g genome.txt ...

ADD COMMENTlink modified 12 months ago by _r_am32k • written 5.0 years ago by dariober11k
1

It worked! I had to remove first three letters (chr) from the genome file and it worked well! Thank you very much.

(Second method worked as-is, without any modifications)

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by Chico Fernandez70
1
gravatar for harold.smith.tarheel
5.0 years ago by
United States
harold.smith.tarheel4.6k wrote:

You should use the same reference genome for the BED file that you used for the BAM alignment. Otherwise, you're virtually guaranteed that the coordinates won't match.

ADD COMMENTlink written 5.0 years ago by harold.smith.tarheel4.6k

I have the same genome file for sure. It works without -sorted option. The problem is that I need this option...

ADD REPLYlink written 5.0 years ago by Chico Fernandez70

They are not the same. Your BED file contains three chromosomes that are absent in your BAM, and the BAM contains one chromosome that's missing from your BED.

[Added comment; statement no longer accurate b/c the originally posted chromosome files have been edited.]

ADD REPLYlink modified 12 months ago by _r_am32k • written 5.0 years ago by harold.smith.tarheel4.6k

I guess bed file was created using the file with probes' coordinates, provided by manufacturer. There is no probes located on MT and I guess I can ignore chromosomes with random and Un_gl000218.

I just do not have any targets on MT. What if I would have a target panel only from 1 chromosome? There will be non-specific products from other chromosomes, will coverageBed fail in this case too?

ADD REPLYlink modified 12 months ago by _r_am32k • written 5.0 years ago by Chico Fernandez70

It's not failing b/c you have non-specific products in your sequence data. It's failing b/c the chromosome/position information of the two files is different. It' s impossible to compare them and obtain meaningful results - like trying to navigate Moscow using a map of Paris.

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by harold.smith.tarheel4.6k

I am pretty sure both of them (manufacturer and BWA) used the same hg37 reference. And coordinates are the same (coverageBed works without -sorted option, and, as I am working with WES, all the probes are well-covered, and it can not be a coincidence). bedtools tells me:

ERROR: Database file Sample.realigned.dm.recalibrated.bam contains chromosome, but the query file does not.

It does not complain about coordinates. It complains that there are reads in .bam file that were aligned to MT chromosome, but I do not have a MT chromosome in bed file. The comparison with maps is wrong, because coverageBed just need to tell "what is located on sector (75,35) in this city, if we say that the center of the city is (0,0)". It can say "all of your bed-file targets have 0 coverage" because of different references, but not "You used different references, I hate you and will not do a job for you".

ADD REPLYlink modified 12 months ago by _r_am32k • written 5.0 years ago by Chico Fernandez70

You used different references, I hate you and will not do a job for you

Actually, that's exactly what the error message is saying :-).

And just b/c two maps contain the same street names (chromosome) and numbers (position) doesn't mean they're for the same city. But you're right, different cities is too extreme. A better analogy would be using an out-of-date map in a city that's expanded and changed some of the street numbers.

FYI, there is no hg37. UCSC jumped from hg19 to hg38. Do you mean GRCh37? If so, there was the initial release (which lacked the mitochondrial sequences) and 13 patched versions before the release of GRCh38. And those are just the official GRC versions. UCSC and Broad each have there own flavors of the "same" genome.

ADD REPLYlink modified 12 months ago by _r_am32k • written 5.0 years ago by harold.smith.tarheel4.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1319 users visited in the last hour
_