Question: Stuck creating reference genome with STAR
0
gravatar for nash.claire
23 months ago by
nash.claire240
Canada
nash.claire240 wrote:

Hi again,

I want to use STAR to run my RNA-seq analysis however I'm having issues at the first hurdle trying to generate a reference genome.

I want to use the newest rat rn6 build but keep getting errors with genomeGenerate. here is my command :

--runMode genomeGenerate --genomeDir /path/to/directory --genomeFastaFiles ~/path/to/directory/rn6_chr1.fa rn6_chr2.fa rn6_chr3.fa rn6_chr4.fa rn6_chr5.fa rn6_chr6.fa rn6_chr7.fa rn6_chr8.fa rn6_chr9.fa rn6_chr10.fa rn6_chr11.fa rn6_chr12.fa rn6_chr13.fa rn6_chr14.fa rn6_chr15.fa rn6_chr16.fa rn6_chr17.fa rn6_chr18.fa rn6_chr19.fa rn6_chr20.fa rn6_chrMT.fa rn6_chrX.fa rn6_chrY.fa --sjdbGTFfile ~/path/to/directory/rn6.gtf --sjdbOverhang 49 --runThreadN 12 --outFileNamePrefix /path/to/directory/rn6

and here is my error

EXITING because of INPUT ERROR: could not open genomeFastaFile: path/to/directory/rn6_chr1.fa

So here are some points and errors I've already covered after reading posts and forums

- I'm using separate chromsome fasta files as I read that using toplevel.dna files is not good and there isn't a primary.dna file for rn6 yet. I tried toplevel fa file with no success.

-I've gone through and checked that every directory where my files are stored and my output directories etc are fully writable, readable and executable with chmod.

- my genomeDir is completely empty and is situated on a RAID with tons of free space.

- my fasta files and gtf file was downloaded from ensembl and both look fine.

- I'm running this on a Mac Pro which has a 12 core processor and 64gb of RAM and have played with the thread settings which had no effect.

- my reads are 50 bp in length and paired end hence me using the 49 sjdbOverhang setting

I'm completely stuck and lost guys. The manual isn't helping and I've exhausted all the STAR google group and biostars posts relating to this. Can anyone help??

rna-seq genome • 2.5k views
ADD COMMENTlink modified 23 months ago • written 23 months ago by nash.claire240
2
gravatar for harold.smith.tarheel
23 months ago by
United States
harold.smith.tarheel3.9k wrote:

That error is returned when the path is incorrect. Are the genomeFastaFiles nested in your home directory (~) as indicated, or should the path be from the top level like --genomeDir? You can check the path from the desired directory using 'pwd'.

ADD COMMENTlink written 23 months ago by harold.smith.tarheel3.9k
0
gravatar for Constantine
23 months ago by
Constantine150
Germany
Constantine150 wrote:

Check your home directory as harold.smith.tarheel said. If you are still experiencing problems then your fasta files might be corrupted.

Download the Illumina iGenome for rn6 here:

ftp://igenome:G3nom3s4u@ussd-ftp.illumina.com/Rattus_norvegicus/UCSC/rn6/Rattus_norvegicus_UCSC_rn6.tar.gz

Then run on your cluster

STAR --runMode genomeGenerate \
--genomeDir /path/to/directory  \
--genomeFastaFiles /path/to/directory/Rattus_norvegicus/UCSC/rn6/Sequence/WholeGenomeFasta/genome.fa \
--runThreadN 12 --outFileNamePrefix /path/to/directory/rn6

ADD COMMENTlink modified 23 months ago • written 23 months ago by Constantine150
0
gravatar for Michael Dondrup
23 months ago by
Bergen, Norway
Michael Dondrup43k wrote:

In addition:

  •  my reads are 50 bp in length and paired end hence me using the 49 sjdbOverhang setting

sjdbOverhang should be 99 as of mate length -1, that's 2*read length for paired end, afaik, just check with the documentation

  • Why do you want to break down the full fasta file, it just makes things more complicated? There are other ways to save memory, and I am not sure if that way reduces memory requirements at all.
  • if you still want to have per chromosome files, each one of them needs to have the correct path set, not just the first one, as in ~/path/to/directory/rn6_chr1.~/path/to/directory/rn6_chr2.fa  ... ~/path/to/directory/rn6_chrY

not ~/path/to/directory/rn6_chr1 rn6_chr2.fa  ... rn6_chrY

ADD COMMENTlink modified 23 months ago • written 23 months ago by Michael Dondrup43k
0
gravatar for nash.claire
23 months ago by
nash.claire240
Canada
nash.claire240 wrote:

Hi guys,

Thanks so much for the help. I'll try playing around with the file path later and see if that works and I'll change the Overhang setting as suggested. The reason I have the separate chromosome files is because I started off with the toplevel.dna.fa file from Ensembl and genomeGenerate wasn't working. I read that we shouldn't use toplevel fasta files as they contain all the haplotype data etc etc and that it can cause issues. Since there is no primary.dna.fasta file available on Ensembl, I went for the separate chromosome files instead. However, I'd appreciate your opinion on the matter.....

ADD COMMENTlink written 23 months ago by nash.claire240
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1030 users visited in the last hour