Question: STAR index fail with gtf, works with gff3
1
gravatar for Bastien Hervé
16 months ago by
Bastien Hervé4.3k
Limoges, CBRS, France
Bastien Hervé4.3k wrote:

I try to create an index with STAR version STAR-2.5.2b, I got an error at the "processing annotations GTF" step with a GTF file, so I try with the associate GFF3 and it's working, question is, why ? I know, I could use that GFF3 file but I don't want to introduce an other file in my RNA-seq workflow.

Here is the stuff you need :

Reference genome : ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M16/GRCm38.p5.genome.fa.gz

GTF : ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M16/gencode.vM16.chr_patch_hapl_scaff.annotation.gtf

GFF3 : ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M16/gencode.vM16.chr_patch_hapl_scaff.annotation.gff3

STAR : STAR-2.5.2b

I subsampled the reference genome to only keep annotate chromosomes in GTF file, this way I dodge reads that could possibly match outside the annotation ( Looking for a thorough annotation for non-primary assembly units in GRCm38 ). I named it GRCm38.p5.genome_subsampled.fa

I use a cluster to do my job, I set for both strategy (GTF and GFF3), h_vmem (specify the amount of maximum memory required) at 64G and mem (specify the amount of maximum memory required) at 16G, which is enought. I also use 8 threads to process.

Here are my commands :

GTF strategy

$star --runThreadN 8 --runMode genomeGenerate --genomeDir /home/hbastien/work/MGRS/star_index --genomeFastaFiles /home/hbastien/save/MGRS/GRCm38.p5.genome_subsampled.fa --sjdbGTFfile /home/hbastien/save/MGRS/gencode.vM16.chr_patch_hapl_scaff.annotation.gtf --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 75;

GFF3 strategy

$star --runThreadN 8 --runMode genomeGenerate --genomeDir /home/hbastien/work/MGRS/star_index --genomeFastaFiles /home/hbastien/save/MGRS/GRCm38.p5.genome_subsampled.fa --sjdbGTFfile /home/hbastien/save/MGRS/gencode.vM16.chr_patch_hapl_scaff.annotation.gff3 --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 75;

In around 30 minutes with GTF

I got in my error output file :

terminate called after throwing an instance of 'std::out_of_range'

what(): vector::_M_range_check

/var/spool/sge/node002/job_scripts/7117238: line 17: 57352 Abandon

$star --run ThreadN 8 --runMode genomeGenerate --genomeDir /home/hbastien/work/MGRS/star_index --genomeFastaFiles /home/hbastien/save/MGRS/GRCm38.p5.genome_subsampled.fa --sjdbGTFfile /home/hbastien/save/MGRS/gencode.vM16.chr_patch_hapl_scaff.annotation.gtf --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 75

Your job has been killed.

This may happen if one of the followings hold :

  • you exceeded one of the queue/job limits (run time, memory, etc)

  • you (or admin) killed the job using qdel

  • something bad happened.

Now, just in case something bad happened, here are the debug information about your job : total 0

And in my standard output file :

Feb 21 13:45:20 ..... started STAR run

Feb 21 13:45:20 ... starting to generate Genome files

Feb 21 13:46:34 ... starting to sort Suffix Array. This may take a long time...

Feb 21 13:46:51 ... sorting Suffix Array chunks and saving them to disk...

Feb 21 14:09:59 ... loading chunks from disk, packing SA...

Feb 21 14:11:18 ... finished generating suffix array

Feb 21 14:11:18 ... generating Suffix Array index

Feb 21 14:14:43 ... completed Suffix Array index

Feb 21 14:14:43 ..... processing annotations GTF

Whereas with the GFF file, in around 45 minutes

My error output file is empty.

And in my standard output file :

Feb 21 15:34:20 ..... started STAR run

Feb 21 15:34:20 ... starting to generate Genome files

Feb 21 15:35:39 ... starting to sort Suffix Array. This may take a long time...

Feb 21 15:36:04 ... sorting Suffix Array chunks and saving them to disk...

Feb 21 16:05:14 ... loading chunks from disk, packing SA...

Feb 21 16:06:33 ... finished generating suffix array

Feb 21 16:06:33 ... generating Suffix Array index

Feb 21 16:11:03 ... completed Suffix Array index

Feb 21 16:11:03 ..... processing annotations GTF

Feb 21 16:11:27 ..... inserting junctions into the genome indices

Feb 21 16:14:54 ... writing Genome to disk ...

Feb 21 16:14:56 ... writing Suffix Array to disk ...

Feb 21 16:15:13 ... writing SAindex to disk

Feb 21 16:15:14 ..... finished successfully

Epilog : job finished at mer. févr. 21 16:15:14 CET 2018

I tried to increase memory following the error 'std::out_of_range', but that didn't do the trick...

The two Log.out files are a bit huge to be display here, but if you need it I can share them.

If you have any hints !

Thanks a lot

star gtf gff3 • 1.5k views
ADD COMMENTlink modified 16 months ago • written 16 months ago by Bastien Hervé4.3k
2
gravatar for h.mon
16 months ago by
h.mon26k
Brazil
h.mon26k wrote:

I believe you don't need --sjdbGTFtagExonParentTranscript Parent to index GTF files - maybe this is the source of error?

ADD COMMENTlink written 16 months ago by h.mon26k

Well played, works better now. I didn't think this option could interfer... If you want to add this as an answer, I'll mark it as accepted. Thank you

ADD REPLYlink written 16 months ago by Bastien Hervé4.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1619 users visited in the last hour