Question: sortmerna terminates without generating the file with rRNA being removed
1
gravatar for bioinformatics.queries
15 days ago by
bioinformatics.queries50 wrote:

Hello everyone

I need help with the sortmenrna tool. I used the command to run sortmerna to remove the RNA from the fastq files. However, when the walltime 1-12:00:00 was over, the command stops without generated the aligned file and the cleaned files. The reference file for human rDNA was downloaded from the following link The command used to run is as below

sortmerna --ref rRNA.fasta --reads $sample --aligned "${file}_rna" --other "${file}_clean" --threads 20 -v --fastx

human rDNA

I will highly appreciated if there could be some suggestions.

Thank you so much

alignment • 157 views
ADD COMMENTlink modified 15 days ago • written 15 days ago by bioinformatics.queries50

Was 1-12:00:00 the time limit for your job? sortmerna can easily take longer than that in some cases. What were the last messages it printed to stderr and stdout?

ADD REPLYlink written 15 days ago by Dunois490

Here is stderr

enter image description here

ADD REPLYlink modified 14 days ago • written 14 days ago by bioinformatics.queries50

stdout

Here is stdout

ADD REPLYlink modified 14 days ago • written 14 days ago by bioinformatics.queries50

Don't post screenshots of errors they are impossible to decipher. Please use pastebin.com to post the logs if they are long.

ADD REPLYlink written 14 days ago by GenoMax95k
1

I am so sorry with the screen shot. Please find the link for log file being uploaded. log-file-sortmerna

ADD REPLYlink written 14 days ago by bioinformatics.queries50

Thank you for the logs and the screenshots. This is the log for the key-value database, which isn't very informative here.

Could you please run sortmerna like so and share smrlog.txt with us?

sortmerna --ref rRNA.fasta --reads $sample --aligned "${file}_rna" --other "${file}_clean" --threads 20 -v --fastx --workdir ${PWD} | tee -a ${PWD}/smrlog.txt

(Please note, this will put all the working files in the current directory you are in; so switch to an appropriate--perhaps empty--directory accordingly.)

Just to hasten the debugging process, I suggest you just subset ca. 100 reads or so at random from your fastq file (e.g., just take the first 100 reads; you might find SeqKit helpful here), and run sortmerna on those reads only.

ADD REPLYlink modified 14 days ago • written 14 days ago by Dunois490

I ran the command as suggested and the log-file was generated. The command stopped after running for 2 days of specified wall time without generating any results. I am looking forward for suggestions ahead. I used 56 processors to run the command. Please find the log-file in the link-below. log-file

ADD REPLYlink written 12 days ago by bioinformatics.queries50
0
gravatar for Dunois
12 days ago by
Dunois490
Dunois490 wrote:

Some questions:

  • So you ran this on a compute cluster? How is sortmerna installed? Are you executing via srun or sbatch?
  • How is sortmerna installed?
  • Are your output paths actually accessible and writable?
  • Also try pointing --workdir to /data/shilpia2/NOR.sequecnes/temp/. (Could be that sortmerna has no access to ${PWD}.)
  • Please try the runs with a this test read set. With your rRNA reference, it should indicate 3 or 4 matches in aligned.log once sortmerna is done. The run itself shouldn't take any longer than a few seconds.

It doesn't make any sense that it does not execute at all. Since you're running this on slurm it would be nice if you could share both stderr and stdout.

ADD COMMENTlink written 12 days ago by Dunois490
1

In answer to your question.

  1. I did run on HPC cluster and we use sbatch to run the command.
  2. I do not know how was it installed.
  3. yes the output paths are accessible and writable.
  4. I am pointing my --workdir to /data/shilpia2/NOR.sequecnes/temp/
  5. I will work on the test read set.
  6. I have a question which i guess i forgot to mention. Can we use sortmerna for DNA sequencing data as well. Beside rnaseq I do have DNA-sequencing data which has rDNA contamination.
  7. I am using reference DNA from the following link. rDNA reference. Do we also need to create index file ? I could not create index file file because indexdb_rna command was not working with the sortmerna.

Thank you so much.

ADD REPLYlink modified 11 days ago • written 11 days ago by bioinformatics.queries50
1

I run the test file and it has been executed as per the suggestion and I have the results attached to this link test_rna_results. I have also included both stderr and `stdout' in the uploaded file. I would like to know if the command was executed correct?

My next question is can we use sortmerna to remove rDNA from DNA-sequencing data?

ADD REPLYlink written 11 days ago by bioinformatics.queries50

Hmm looks like the test run executed properly without a hitch (take a look at the log file). I can only speculate, then, that there are issues with your input fasta/fastq file. I noticed that it is compressed. Perhaps try decompressing the file first before feeding it to sortmerna? It is plausible that the tool is not handling the compressed file properly (even though it should).

As for your point 6: I think it'll work on any data as long as the alphabet in the reference and the input match. (You should confirm this with the developers, but I don't see any reason why this wouldn't be the case.)

Regarding point 7: I don't think you need to create any index files. The execution syntax I had indicated in the test run should suffice for all cases.

And I think your other question (in the comment I am replying to) is addressed by my response to point 6.

ADD REPLYlink modified 10 days ago • written 11 days ago by Dunois490

Thanks you so much for your response.

ADD REPLYlink written 10 days ago by bioinformatics.queries50

You're welcome. Let me know how it goes!!

ADD REPLYlink written 10 days ago by Dunois490

Hi

I was able to run sortmerna for filtering rDNA from DNA-sequencing data. It is required to unzip the fastq file to run the sortmerna. I did take 100 reads and was able to filter out rDNA from the sequencing data.

Thanks

ADD REPLYlink written 8 days ago by bioinformatics.queries50

So it was just the compressed file then? I hope everything goes smoothly hereonforth.

ADD REPLYlink written 8 days ago by Dunois490
1

Yes the problem was just with the compressed file. Thank you so much.

ADD REPLYlink written 7 days ago by bioinformatics.queries50
1

I have moved a comment (that was able to keep the flow of the thought process) to an answer. You can accept it providing closure to this thread.

ADD REPLYlink written 7 days ago by GenoMax95k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1701 users visited in the last hour
_