Error when running cellranger mkref single nuclei RNA-Seq
1
0
Entering edit mode
16 months ago
robert.yzc • 0

Hi everyone -

I'm trying to make a custom reference for a 10x Genomics v3 single-nuclei RNA-Seq run. According to the instructions on 10x's website (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references) I can use the following commands:

# 1. Download the Ensembl98 release of mm10's genome (primary assembly):
wget ftp://ftp.ensembl.org/pub/release-98/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz

# 2. Unzip the genome
gunzip Mus_musculus.GRCm38.dna.primary_assembly.fa.gz

wget ftp://ftp.ensembl.org/pub/release-98/gtf/mus_musculus/Mus_musculus.GRCm38.98.gtf.gz

# 4. Unzip the .gtf file
gunzip Mus_musculus.GRCm38.98.gtf.gz

# 5. Filter the .gtf file for the biotype "transcript" and change them to "exon". This has the functional effect of creating a pre-mRNA gtf file.
awk 'BEGIN{FS="\t"; OFS="\t"} $3 == "transcript"{$3="exon"; print}' Mus_musculus.GRCm38.98.gtf > Mus_musculus.GRCm38.98.premrna.gtf

# 6. Use cellranger mkref to create a reference for downstream analysis
cellranger mkref --genome=mm10 \
--fasta=Mus_musculus.GRCm38.dna.primary_assembly.fa \
--genes=Mus_musculus.GRCm38.98.premrna.gtf \
--ref-version=3.1.0

Step 6 is where my error appears. When I run this command, the following results appear:

+++++++

Creating new reference folder at /scratch/jglab/RYC/191218_snRNASeq_spikein_SPFvGF/CellRanger_References/refdata-cellranger-mm10-3.0.0_premrna/mm10_3.0.0_premrna
...done

Writing genome FASTA file into reference folder...
...done

Computing hash of genome FASTA file...
...done

Indexing genome FASTA file...
...done

Writing genes GTF file into reference folder...
...done

Computing hash of genes GTF file...
...done

Writing genes index file into reference folder (may take over 10 minutes for a 3Gb genome)...
/opt/htcf/spack/opt/spack/linux-ubuntu16.04-x86_64/gcc-5.4.0/cellranger-3.1.0-f4xtbwsorfbhh23ig7ccyjrfgipn5zwj/cellranger-cs/3.1.0/bin/../tenkit/bin/common/_master: line 76: 12996 Killed                  $SUBCMD "$@"


+++++++

I am currently running cellranger v 3.1.0 through our university's cluster, which stores cellranger at the directory /opt/htcf/spack/opt/spack/linux-ubuntu16.04-x86_64/gcc-5.4.0/cellranger-3.1.0-f4xtbwsorfbhh23ig7ccyjrfgipn5zwj/cellranger-cs/3.1.0/bin/. Looking at the github repository for cellranger, I can see the line that reads $SUBCMD "$@" but do not know functionally what it's doing. I checked the fasta file for the genome as well as the pre-filtered and post-filtered .gtf file and they are all of the correct format (chromosome names match, .gtf file has appropriate column names, both were downloaded from ensembl). I have attempted this pipeline with different versions of cellranger as well as different versions of the ensembl mouse genome/gtf annotations, and I get the same error each time.

Would love your thoughts on how I could move past this, or if anyone has had any similar experiences with these errors. If I can help provide additional information that would be helpful, please let me know. I really appreciate your help!

10x RNA-seq single-cell RNA-Seq • 1.3k views
0
Entering edit mode

I suggest you use the pre-made indexes that 10x provides and save yourself the trouble. Is there a reason you are trying to make these yourself? Pre-made indexes for human and mouse genomes are available here. You will need to do a click-through registration.

0
Entering edit mode

If you scroll down, you'll see that for single nuclei applications, you need to do the tweaks the OP described to the gtf before mkref, so the OP does have to make their own index locally. (Looks like you relabel 'transcripts' as 'exons' so it won't expect introns to be omitted)

0
Entering edit mode

My apologies for not looking at the exact link included by OP and thanks for pointing out the single nuclei application.

Instructions provided uses files from their premade index bundle. I just tried the instructions out. They seem to be working for creation of the modified index on a 32G server. It is taking some time for the operation to complete. Will update when done.

0
Entering edit mode

Thanks for testing this out genomax! As swbarnes2 pointed out, I need to make a custom reference to only include "transcripts" converted to "exons" so that introns are retained in the .gtf file. It's reassuring to see that on 32G, creating the reference works. I'll report back as soon as I get access to more system memory.

0
Entering edit mode

Hey genomax!

I added additional memory and haven't errored out after ~25 minutes... Looking good so far! I can't believe the fix was (potentially) this simple. Thanks for riding this journey with me..!

0
Entering edit mode

To speed this up further you could also add more threads/memory at end of your mkref command. e.g. --nthreads=8 --memgb=40 By default it uses 1 thread and 16G RAM.

This run took about an hour to complete.

2
Entering edit mode
16 months ago

Looks to me like you are running out of memory.

My guess is that the line in question is a subcommand launched by cellranger, and the this command is using too much memory and is therefore being killed by the system.

Often with compute clusters you only get a access to a relatively small part of a compute nodes memory by default, and have to request a larger amount for more complicated jobs.

0
Entering edit mode

I hope it is such a simple solution! Thanks for taking the time to respond so quickly. I will request the cluster to provide me additional memory (100 GB) and see if the error persists. I'll report back on the results, regardless of what they are.

0
Entering edit mode

An update: So far, I haven't errored out after adding the additional memory buffer!! Fingers crossed, but it looks like that fixed it. Thanks for your help, you won't believe how long I spent trying to troubleshoot this...

Any way I can "accept" a formal answer like stackexchange? I'm new to biostars so just want to make sure I can give credit.

0
Entering edit mode

I moved @Ian's comment to an answer. You can accept it as follows.

0
Entering edit mode

Ah I see, thanks! Just accepted it.