Question: Build repeat genome index using STAR
0
gravatar for yancychy
25 days ago by
yancychy10
yancychy10 wrote:

Hi , I downloaded the repeat genome and gtf (RepeatMasker) files from UCSC genome table browser. I want to build repeat genome index to remove the reads which may be spurious artifacts from rRNA (& other) repetitive reads. But the error is always exceeding memory limit. I adjust the memory from 30GB to 120GB.
The repeat genome file size is 2.1GB and gtf file size is 552 MB.

<h6>######################################## output</h6>

Nov 18 17:58:19 ..... started STAR run Nov 18 17:58:19 ... starting to generate Genome files slurmstepd: Job 11091167 exceeded memory limit (123675052 > 122880000), being killed slurmstepd: Exceeded job memory limit slurmstepd: * JOB 11091167 CANCELLED AT 2019-11-18T13:20:19 * on node311

<h6>############################################### Script</h6>
/home/ychen10/STAR-2.7.3a/bin/Linux_x86_64/STAR  
       --runThreadN 4 \
       --runMode genomeGenerate \
       --genomeDir index \
       --genomeFastaFiles repeatSeq.fa \
       --sjdbGTFfile repeatSeq.gtf \
       --sjdbOverhang 99 \
       --genomeChrBinNbits 16 \
       --genomeSAindexNbases 10 \
       --genomeSAsparseD 4

I am not sure the problem is caused by the repeat genome or the memory. Thanks.

index star repeat • 94 views
ADD COMMENTlink modified 25 days ago • written 25 days ago by yancychy10

Thanks. I tired the --limitGenomeGenerateRAM. It produced same error.

ADD REPLYlink written 25 days ago by yancychy10

comments are for answers, please use the reply button (yeah it's a bit strange but it makes finding much easier!).

The same error from slurm? If so, something is going wrong because STAR shouldn't be using more than the limit specified. Can you try supplying say 50gig of memory but limit STAR to 40gig?

ADD REPLYlink modified 25 days ago • written 25 days ago by Amar620
1

Thanks. I tired to limit STAR to 40gb. The error is same. I think the problem may caused by the input files.

repeatSeq.fa

>hg38_rmsk_L1P5 range=chr1:67108754-67109046 5'pad=0 3'pad=0 strand=+ repeatMasking=none
AACAAATAATCCCATCAAAAAGTAGGCAAAGGATATGAATAGATAATTTT
CAAAATAAGATATACAAATGAAAAAATGCTCAACATCACTAATTATCAGG
GAAATGCAAATTAAAACCACAATGAGATACTGCCTTATTCCTGAAAGAAT
GGCCATAATTTAAAAATTTTTTAAAAAATAGACCTTGGCATGGATGTGGT
AAAAAGGGAACACTTTTACACTGTTGGTGGGAATGTAAACTAGTATAAAC
ACTATGGAAAACAGTATGAAAATACCTTAAAGAATTAAAAGTA

>hg38_rmsk_AluY range=chr1:8388316-8388618 5'pad=0 3'pad=0 strand=- repeatMasking=none
GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCAA
GGCGGGCGGATCATGAGGTCAGGAGATCGAGACCATCCTGGCTAACAAGG
TGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCCGGGCGCGGTGGC
GGGCGCCTGTAGTCCCAGCTACTCAGGAGGCTGAGGCAGGAGAATGGCGT
GAACCCGGGAAGCGGAGCTTGCAGTGAGCCGAGATTGCGCCACTGCAGTC
CGCAGTCCAGCCTGGGCGACAGAGTGAGACTCCGTCTCAAAAAAAAAAAA
AGA

repeatSeq.gtf head -5 repeatSeq.gtf

chr1    hg38_rmsk       exon    67108754        67109046        1892.000000     +       .       gene_id "L1P5"; transcript_id "L1P5";
chr1    hg38_rmsk       exon    8388316 8388618 2582.000000     -       .       gene_id "AluY"; transcript_id "AluY";
chr1    hg38_rmsk       exon    25165804        25166380        4085.000000     +       .       gene_id "L1MB5"; transcript_id "L1MB5";
chr1    hg38_rmsk       exon    33554186        33554483        2285.000000     -       .       gene_id "AluSc"; transcript_id "AluSc";
chr1    hg38_rmsk       exon    41942895        41943205        2451.000000     -       .       gene_id "AluY"; transcript_id "AluY_dup1";
ADD REPLYlink modified 23 days ago by h.mon28k • written 25 days ago by yancychy10

Beyond me I'm sorry. I suggest posting an issue on the github page of STAR. The maintainer is excellent with troubleshooting weird cases.

ADD REPLYlink written 24 days ago by Amar620

Yes. Thanks very much

ADD REPLYlink written 24 days ago by yancychy10

Why not remove the repeat region maps with repeatmask regions after the alignment?

ADD REPLYlink written 25 days ago by Shicheng Guo7.9k
1
gravatar for Amar
25 days ago by
Amar620
Amar620 wrote:

Add --limitGenomeGenerateRAM and see how you go. For whatever reason the indexing job is using a large amount of ram.

ADD COMMENTlink written 25 days ago by Amar620
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 989 users visited in the last hour