Question: Error when running cellranger mkref single nuclei RNA-Seq
0
gravatar for robert.yzc
6 months ago by
robert.yzc0
robert.yzc0 wrote:

Hi everyone -

I'm trying to make a custom reference for a 10x Genomics v3 single-nuclei RNA-Seq run. According to the instructions on 10x's website (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references) I can use the following commands:

# 1. Download the Ensembl98 release of mm10's genome (primary assembly):
wget ftp://ftp.ensembl.org/pub/release-98/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz

# 2. Unzip the genome
gunzip Mus_musculus.GRCm38.dna.primary_assembly.fa.gz

# 3. Download the Ensembl98 release of mm10's annotation file (GTF):
wget ftp://ftp.ensembl.org/pub/release-98/gtf/mus_musculus/Mus_musculus.GRCm38.98.gtf.gz

# 4. Unzip the .gtf file
gunzip Mus_musculus.GRCm38.98.gtf.gz

# 5. Filter the .gtf file for the biotype "transcript" and change them to "exon". This has the functional effect of creating a pre-mRNA gtf file.
awk 'BEGIN{FS="\t"; OFS="\t"} $3 == "transcript"{ $3="exon"; print}' Mus_musculus.GRCm38.98.gtf > Mus_musculus.GRCm38.98.premrna.gtf

# 6. Use cellranger mkref to create a reference for downstream analysis
cellranger mkref --genome=mm10 \
                 --fasta=Mus_musculus.GRCm38.dna.primary_assembly.fa \
                 --genes=Mus_musculus.GRCm38.98.premrna.gtf \
                 --ref-version=3.1.0

Step 6 is where my error appears. When I run this command, the following results appear:

+++++++

Creating new reference folder at /scratch/jglab/RYC/191218_snRNASeq_spikein_SPFvGF/CellRanger_References/refdata-cellranger-mm10-3.0.0_premrna/mm10_3.0.0_premrna
...done

Writing genome FASTA file into reference folder...
...done

Computing hash of genome FASTA file...
...done

Indexing genome FASTA file...
...done

Writing genes GTF file into reference folder...
...done

Computing hash of genes GTF file...
...done

Writing genes index file into reference folder (may take over 10 minutes for a 3Gb genome)...
/opt/htcf/spack/opt/spack/linux-ubuntu16.04-x86_64/gcc-5.4.0/cellranger-3.1.0-f4xtbwsorfbhh23ig7ccyjrfgipn5zwj/cellranger-cs/3.1.0/bin/../tenkit/bin/common/_master: line 76: 12996 Killed                  $SUBCMD "$@"

+++++++

I am currently running cellranger v 3.1.0 through our university's cluster, which stores cellranger at the directory /opt/htcf/spack/opt/spack/linux-ubuntu16.04-x86_64/gcc-5.4.0/cellranger-3.1.0-f4xtbwsorfbhh23ig7ccyjrfgipn5zwj/cellranger-cs/3.1.0/bin/. Looking at the github repository for cellranger, I can see the line that reads $SUBCMD "$@" but do not know functionally what it's doing. I checked the fasta file for the genome as well as the pre-filtered and post-filtered .gtf file and they are all of the correct format (chromosome names match, .gtf file has appropriate column names, both were downloaded from ensembl). I have attempted this pipeline with different versions of cellranger as well as different versions of the ensembl mouse genome/gtf annotations, and I get the same error each time.

Would love your thoughts on how I could move past this, or if anyone has had any similar experiences with these errors. If I can help provide additional information that would be helpful, please let me know. I really appreciate your help!

ADD COMMENTlink modified 6 months ago by genomax85k • written 6 months ago by robert.yzc0

I suggest you use the pre-made indexes that 10x provides and save yourself the trouble. Is there a reason you are trying to make these yourself? Pre-made indexes for human and mouse genomes are available here. You will need to do a click-through registration.

ADD REPLYlink modified 6 months ago • written 6 months ago by genomax85k

If you scroll down, you'll see that for single nuclei applications, you need to do the tweaks the OP described to the gtf before mkref, so the OP does have to make their own index locally. (Looks like you relabel 'transcripts' as 'exons' so it won't expect introns to be omitted)

ADD REPLYlink written 6 months ago by swbarnes27.9k

My apologies for not looking at the exact link included by OP and thanks for pointing out the single nuclei application.

Instructions provided uses files from their premade index bundle. I just tried the instructions out. They seem to be working for creation of the modified index on a 32G server. It is taking some time for the operation to complete. Will update when done.

ADD REPLYlink written 6 months ago by genomax85k

Thanks for testing this out genomax! As swbarnes2 pointed out, I need to make a custom reference to only include "transcripts" converted to "exons" so that introns are retained in the .gtf file. It's reassuring to see that on 32G, creating the reference works. I'll report back as soon as I get access to more system memory.

ADD REPLYlink written 6 months ago by robert.yzc0

Hey genomax!

I added additional memory and haven't errored out after ~25 minutes... Looking good so far! I can't believe the fix was (potentially) this simple. Thanks for riding this journey with me..!

ADD REPLYlink written 6 months ago by robert.yzc0

To speed this up further you could also add more threads/memory at end of your mkref command. e.g. --nthreads=8 --memgb=40 By default it uses 1 thread and 16G RAM.

This run took about an hour to complete.

ADD REPLYlink modified 6 months ago • written 6 months ago by genomax85k
2
gravatar for i.sudbery
6 months ago by
i.sudbery8.1k
Sheffield, UK
i.sudbery8.1k wrote:

Looks to me like you are running out of memory.

My guess is that the line in question is a subcommand launched by cellranger, and the this command is using too much memory and is therefore being killed by the system.

Often with compute clusters you only get a access to a relatively small part of a compute nodes memory by default, and have to request a larger amount for more complicated jobs.

ADD COMMENTlink written 6 months ago by i.sudbery8.1k

I hope it is such a simple solution! Thanks for taking the time to respond so quickly. I will request the cluster to provide me additional memory (100 GB) and see if the error persists. I'll report back on the results, regardless of what they are.

ADD REPLYlink written 6 months ago by robert.yzc0

An update: So far, I haven't errored out after adding the additional memory buffer!! Fingers crossed, but it looks like that fixed it. Thanks for your help, you won't believe how long I spent trying to troubleshoot this...

Any way I can "accept" a formal answer like stackexchange? I'm new to biostars so just want to make sure I can give credit.

ADD REPLYlink written 6 months ago by robert.yzc0

I moved @Ian's comment to an answer. You can accept it as follows.

Upvote|Bookmark|Accept

ADD REPLYlink modified 6 months ago • written 6 months ago by genomax85k

Ah I see, thanks! Just accepted it.

ADD REPLYlink written 6 months ago by robert.yzc0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 806 users visited in the last hour