Samtools idxstats fails to load index
2
1
Entering edit mode
7.3 years ago
lvande ▴ 50

I recently used samtools in order to downsample some bam files I have from a sequencing run as such:

Samtools view -h -s 0.5 file.bam > ds_file.bam

I was then going to go through a normalization protocol I have to average the reads to reads per million, which first requires me to remove mitochondrial reads, which I normally do with idxstats:

Samtools index ds_file.bam
Samtools idxstats ds_file.bam | cut -f 1 | grep -v MT | xargs samtools view -b ds_file.bam > dsfile_rmMT.bam

However, when running idxstats with these downsampled files, I am getting an error message saying that the index has failed to load. I checked the downsampled bam file and it is definitely still sorted by coordinates. Does anyone know why I am failing to generate a useable index?

ChIP-Seq samtools error index • 4.4k views
ADD COMMENT
0
Entering edit mode

so when you run Samtools index ds_file.bam you get the error? or when you run idxstats right afterwards?

Typically these sorts of errors come from either left-over indexes lying around. Furthermore, chaining commands together like this

Samtools idxstats ds_file.bam | cut -f 1 | grep -v MT | xargs samtools view -b ds_file.bam > dsfile_rmMT.bam

is generally considered a bad idea as you combine the process of command generation and command execution into a single operation, making debugging it much harder. Best to just use your cut/grep/xargs to echo out the samtools command rather than actually run it. Then that removes multiple sources of potential error from the question.

ADD REPLY
0
Entering edit mode

I get the error when I run idxstats. Previously I was getting an error when running samtools index, but that has mysteriously stopped occurring.

I'm fairly certain there is something wrong with the bam file or the indexing, and not chaining commands, as I have previously run that exact command with other bed files and gotten results just fine.

ADD REPLY
0
Entering edit mode

right - but this time you don't get results just fine.

I'm not trying to say the chaining of commands was the problem (although it definitely could have been) i'm just saying that in general it is, like, fundamentally leaving you open to more errors to do it that way. Also it's a security risk.

ADD REPLY
0
Entering edit mode
7.3 years ago

samtools view ds_file.bam | awk 'BEGIN{OFS="\t"}{if($3 != "MT") print}' | samtools view -Sbo dsfile_rmMT.bam

This doesn't even require an index.

ADD COMMENT
0
Entering edit mode

I don't see how this would solve the problem? I understand the 'bad' practice, but the above method works and is a supported way of removing mito reads (https://www.biostars.org/p/128967/). Seems to be an underlying problem in the sam file having to do with the headers, that might otherwise be overlooked and could create problems later in the pipeline. I would not recommend using this as this seems to be the quick work-around fix and fix what's wrong with file

ADD REPLY
0
Entering edit mode

My version is faster and doesn't write the SAM intermediate to disk (that intermediate was incorrectly being labeled as a BAM file in OP's post). Yeah, one could just make that a BAM file, but my answer will still be quicker.

ADD REPLY
0
Entering edit mode

Gotcha that is good to know! I was just saying as pertaining to her question. I would recommend posting in that thread (Remove mitochondrial reads from BAM files) a quick answer just so people know for future!

ADD REPLY
0
Entering edit mode
7.3 years ago
lvande ▴ 50

I was working on doing other manipulations with these files and noticed that bioconductor genomation errored out saying my bam file has improper headers. When looking at the headers of the downsampled vs full files with samtools view, I could only find a difference of one column, which only contained "=" for each read. Could this be the source of the problem? If so, how do I add this back in?

ADD COMMENT
3
Entering edit mode

Solution found, I was of course running samtools with a sam output. I reran downsampling as samtools -b -s to output in a true bam format and it solved both of my issues.

ADD REPLY
0
Entering edit mode

Be honest - and if the answer is no the answer is no... - but do you think you would have detected the typo sooner had the command been simpler and didn't have all the cut xargs idxstats stuff in it and was just "samtools view -b -s chr1 chr2 chrr3.... " ?

ADD REPLY
0
Entering edit mode

No, it wouldn't have. The problem wasn't the chained command at all, it was the command two prior to generate the downsampled file. The only reason I figured it out was that I was discussing the problem with a colleague and they mentioned it being a sam output.

ADD REPLY
0
Entering edit mode

Ah ok, fair enough :)

ADD REPLY

Login before adding your answer.

Traffic: 2498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6