How To Solve An Out-Of-Memory Error When Using Usearch Chimera Detection
3
1
Entering edit mode
9.9 years ago

Hi

I was trying to detect chimera removal command from USEARCH program and i have encountered memory related error. Does anybody have an idea of what it is

usearch6.0.307_i86linux32 -uchime_ref transcripts.fa -db ref.fa -uchimeout results.uchime -strand plus


---Fatal error--- Out of memory, mymalloc(696115460), curr 3.16e+09 bytes

Thanks Upendra

memory • 10k views
1
Entering edit mode

The error message seems pretty self explanatory. You just ran out of memory. Is there a reason why you might think it's not a memory limitation issue? Is the file really small or do you have tons of ram?

0
Entering edit mode

I thought i had enough memory 132GB. It might be a case of big input file i think. Thanks anyway for the pointer

3
Entering edit mode
9.9 years ago
Cliff Beall ▴ 450

You did run out of memory because the 32-bit version can only use 3-4 GB. Your choices are to pay for the 64-bit version (~\$900), or as mentioned by Josh divide the queries into smaller pieces.

2
Entering edit mode
9.9 years ago
Josh Herr 5.7k

Just to repeat what Damian stated above you have run out of memory.

You're using the linux 32 bit version of USEARCH, so I am assuming you're doing this on a desktop computer and maybe don't have so much RAM. Chimera checking is not trivial in regards to memory. First check the USEARCH documentation: It's a pretty good place to start.

You didn't give us any information in regards to what you are checking. You have transcripts.fa for your fasta file: Are these transcripts from what? Amplicons from metagenomes? What was the laboratory process to attain these reads? USEARCH uses other sequences (your ref.fa) as the query source. First, I would make sure your input sequences are actually comparable with your input database.

Second, if you've solved the database/query issue, try reducing the input size of your "transcripts.fa" file. Split it in two and see if you still are running out of memory. You can also reduce the size of your database file, but try the input file size first. Also QC your sequences to remove as many errors and poor reads as possible if you haven't done that.

Lastly, you're using the -strand plus flag, try searching on just one strand as you are doubling the amount of memory you are using for search.

1
Entering edit mode

Hi Josh, thanks for the info. I am using server which have 132GB RAM and probably its my fast file that is big enough to run out of memory. Regarding your second question. I am trying to check for chimeras in my denovo assembly done using velvet/oases pipeline. The assembly i got is pretty much clean of contaminants and duplicates. I have used Illumina Truseq to make these libraries. I have already blasted few transcripts from my assembly and found couple of them are actually chimeras blasting two different chromosomes and so the whole point of using USEARCH UCHIME command is to detect chimeras in my entire assembly and remove them before annotating my transcriptome.
I thought -strand plus is already searching for single strand and the only other option is -both. Correct me if i am wrong.

0
Entering edit mode

I'm still not clear from where your transcripts came from. What type of data yo uhave affects what type of chimera detection database you should provide.

I usually leave the strand flag blank, so I am not sure what it should be, but you'll want to check the manual on ways to limit your memory usage.

0
Entering edit mode

The transcript came from velvet/oases assembly and i want to make sure they don't contain any chimeras. I think 'strand' option is compulsory in the new version of USEARCH UCHIME.

0
Entering edit mode

In all honesty, I'm completely flummoxed to why you are searching for chimeras after you have mapped transcripts. Can you tell me why you are searching for them now? A chimera is a PCR amplification error which occurs during strand breakage.

I would think you would want to (1) remove errors before you did any data analysis and speed up downstream analyses, and (2) that because chimeras are a hybrid of two primer reads, that they might not map to your reference (this depends on your mapping parameters of course).

This is all depends on your query sequence database, how did you design this? You are searching for chimeras across all your transcripts? If so, I am under the impression that your database is not compatible with your transcripts.

0
Entering edit mode

As I mentioned below, the free version of Uchime is 32-bit, so the fact your server has 132 GB is not relevant to your problem. Anyway, Uchime is really designed to detect chimeras in 16S rRNA amplicons or similar, not for what you're doing.

0
Entering edit mode
6.0 years ago

Just split you file into separate chunk and run uchime_ref in for loop