Question: Samtools idxstats fails to load index
1
gravatar for lvande
2.6 years ago by
lvande50
lvande50 wrote:

I recently used samtools in order to downsample some bam files I have from a sequencing run as such:

Samtools view -h -s 0.5 file.bam > ds_file.bam

I was then going to go through a normalization protocol I have to average the reads to reads per million, which first requires me to remove mitochondrial reads, which I normally do with idxstats:

Samtools index ds_file.bam
Samtools idxstats ds_file.bam | cut -f 1 | grep -v MT | xargs samtools view -b ds_file.bam > dsfile_rmMT.bam

However, when running idxstats with these downsampled files, I am getting an error message saying that the index has failed to load. I checked the downsampled bam file and it is definitely still sorted by coordinates. Does anyone know why I am failing to generate a useable index?

index error samtools chip-seq • 1.5k views
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by lvande50

so when you run Samtools index ds_file.bam you get the error? or when you run idxstats right afterwards?

Typically these sorts of errors come from either left-over indexes lying around. Furthermore, chaining commands together like this

Samtools idxstats ds_file.bam | cut -f 1 | grep -v MT | xargs samtools view -b ds_file.bam > dsfile_rmMT.bam

is generally considered a bad idea as you combine the process of command generation and command execution into a single operation, making debugging it much harder. Best to just use your cut/grep/xargs to echo out the samtools command rather than actually run it. Then that removes multiple sources of potential error from the question.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by John12k

I get the error when I run idxstats. Previously I was getting an error when running samtools index, but that has mysteriously stopped occurring.

I'm fairly certain there is something wrong with the bam file or the indexing, and not chaining commands, as I have previously run that exact command with other bed files and gotten results just fine.

ADD REPLYlink written 2.6 years ago by lvande50

right - but this time you don't get results just fine.

I'm not trying to say the chaining of commands was the problem (although it definitely could have been) i'm just saying that in general it is, like, fundamentally leaving you open to more errors to do it that way. Also it's a security risk.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by John12k
0
gravatar for Devon Ryan
2.6 years ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

samtools view ds_file.bam | awk 'BEGIN{OFS="\t"}{if($3 != "MT") print}' | samtools view -Sbo dsfile_rmMT.bam

This doesn't even require an index.

ADD COMMENTlink written 2.6 years ago by Devon Ryan91k

I don't see how this would solve the problem? I understand the 'bad' practice, but the above method works and is a supported way of removing mito reads (https://www.biostars.org/p/128967/). Seems to be an underlying problem in the sam file having to do with the headers, that might otherwise be overlooked and could create problems later in the pipeline. I would not recommend using this as this seems to be the quick work-around fix and fix what's wrong with file

ADD REPLYlink written 2.6 years ago by datascientist28390

My version is faster and doesn't write the SAM intermediate to disk (that intermediate was incorrectly being labeled as a BAM file in OP's post). Yeah, one could just make that a BAM file, but my answer will still be quicker.

ADD REPLYlink written 2.6 years ago by Devon Ryan91k

Gotcha that is good to know! I was just saying as pertaining to her question. I would recommend posting in that thread (Remove mitochondrial reads from BAM files) a quick answer just so people know for future!

ADD REPLYlink written 2.6 years ago by datascientist28390
0
gravatar for lvande
2.6 years ago by
lvande50
lvande50 wrote:

I was working on doing other manipulations with these files and noticed that bioconductor genomation errored out saying my bam file has improper headers. When looking at the headers of the downsampled vs full files with samtools view, I could only find a difference of one column, which only contained "=" for each read. Could this be the source of the problem? If so, how do I add this back in?

ADD COMMENTlink written 2.6 years ago by lvande50
3

Solution found, I was of course running samtools with a sam output. I reran downsampling as samtools -b -s to output in a true bam format and it solved both of my issues.

ADD REPLYlink written 2.6 years ago by lvande50

Be honest - and if the answer is no the answer is no... - but do you think you would have detected the typo sooner had the command been simpler and didn't have all the cut xargs idxstats stuff in it and was just "samtools view -b -s chr1 chr2 chrr3.... " ?

ADD REPLYlink written 2.6 years ago by John12k

No, it wouldn't have. The problem wasn't the chained command at all, it was the command two prior to generate the downsampled file. The only reason I figured it out was that I was discussing the problem with a colleague and they mentioned it being a sam output.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by lvande50

Ah ok, fair enough :)

ADD REPLYlink written 2.6 years ago by John12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1194 users visited in the last hour