Hi all,
I am VERY new to SortMeRNA (I'm a PhD student taking a bioinformatics class that has been very poorly taught). I have 27 paired samples for a total of 54 samples named like this: SRR13711719_1_val_1.fq
SRR13711719_2_val_2.fq
. So the format is *_1_val_1.fq
and *_2_val_2.fq
I've read a lot of other posts addressing this question, but I had a hard time following them. I barely know the basics of writing slurm scripts. I know that SortMeRNA usually goes through samples one at a time, so I thought I could submit a job with the same command changed for each file (although that might just take the same amount of time as doing them individually?) like so:
#! /bin/bash
#SBATCH --job-name SM1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --time 22:00:00
#SBATCH --partition guru
#SBATCH --output SortMe-%j.out
#SBATCH --error SortMe-%j.err
sortmerna --workdir sortmerna_db/ --ref sortmerna_db/rRNA_databases/rfam-5.8s-database-id98.fasta --ref sortmerna_db/rRNA_databases/rfam-5s-database-id98.fasta --ref sortmerna_db/rRNA_databases/rfam-5s-database-id98.fasta --ref sortmerna_db/rRNA_databases/silva-arc-16s-id95.fasta --ref sortmerna_db/rRNA_databases/silva-arc-23s-id98.fasta --ref sortmerna_db/rRNA_databases/silva-bac-16s-id90.fasta --ref sortmerna_db/rRNA_databases/silva-bac-23s-id98.fasta --ref sortmerna_db/rRNA_databases/silva-euk-18s-id95.fasta --ref sortmerna_db/rRNA_databases/silva-euk-28s-id98.fasta --reads results/2_trimmed_output/SRR13711719_1_val_1.fq --aligned results/3_rRNA/aligned/SRR13711719_1_val_1.subset_aligned --other results/3_rRNA/filtered/SRR13711719_1_val_1.subset_filtered --fastx -threads 4 -v
rm -r sortmerna_db/idx sortmerna_db/kvdb
sortmerna --workdir sortmerna_db/ --ref sortmerna_db/rRNA_databases/rfam-5.8s-database-id98.fasta --ref sortmerna_db/rRNA_databases/rfam-5s-database-id98.fasta --ref sortmerna_db/rRNA_databases/rfam-5s-database-id98.fasta --ref sortmerna_db/rRNA_databases/silva-arc-16s-id95.fasta --ref sortmerna_db/rRNA_databases/silva-arc-23s-id98.fasta --ref sortmerna_db/rRNA_databases/silva-bac-16s-id90.fasta --ref sortmerna_db/rRNA_databases/silva-bac-23s-id98.fasta --ref sortmerna_db/rRNA_databases/silva-euk-18s-id95.fasta --ref sortmerna_db/rRNA_databases/silva-euk-28s-id98.fasta --reads results/2_trimmed_output/SRR13711719_2_val_2.fq --aligned results/3_rRNA/aligned/SRR13711719_2_val_2.subset_aligned --other results/3_rRNA/filtered/SRR13711719_2_val_2.subset_filtered --fastx -threads 4 -v
rm -r sortmerna_db/idx sortmerna_db/kvdb
for each of my 54 files. But when I run it, it starts working on my reference indices but never even starts the alignment step and doesn't give any errors either. I just want to filter my 54 files without it taking 2.5 hours per file (which is does if I run it individually). If you have any suggestions, I would greatly appreciate them! The semester ends in 9 days and my professor has not been helpful at all with this project.
Thanks so much in advance and I'm sorry this is so chaotic!
Becca
Did you check the SLURM error log files? They would likely be named something like
slurm-jobid.err
.My speculation is your job is likely running out of memory (since you are not specifying it in your SLURM code above). Normally when you do not do this SLURM may use the default allocation set by admins on your cluster. That can generally be low like 4GB. So first add the following (I am randomly using 20G you may need to experiment)
and see if that helps complete the job.
I did check the error log, but it always stays empty. I added the memory specification in my script though, so hopefully that will help. Thank you so much for responding to me!
Let me know what happens. We should be able to get this working.