STAR mapping taking way too long (18hrs to complete ~60M PE reads)
0
0
Entering edit mode
3.4 years ago
Jen ▴ 70

I have a lot of experience running STAR and it usually runs pretty quick on my machine with 65G RAM, but I've made some changes on how I am creating the index file by including --sjdbOverhang 99 --genomeChrBinNbits 11 and --runThreadN 1. I did this because there was an issue with there not being enough RAM to build the genome index, which is weird, but I got that to work. I also used a different annotation file this time (gencode.vM25.annotation.gtf). I'm using STAR version=STAR_2.5.4b\

STAR --runThreadN 1 --runMode genomeGenerate --genomeDir star --genomeFastaFiles genome/GRCm38.primary_assembly.genome.fa --sjdbOverhang 99 --genomeChrBinNbits 11 --limitGenomeGenerateRAM 16000000000 --sjdbGTFfile genes/gencode.vM25.annotation.gtf

Then when running the mapping step, it takes a REALLY long time (~7.5M reads/h). One file is >70M PE I've also made some changes as to how I run the mapping step. I've included --outSAMstrandField intronMotif so I can run StringTie afterwards.

STAR --genomeDir star --readFilesCommand zcat --readFilesIn samples/2_Forward.fq.gz samples/2_Reverse.fq.gz --outSAMtype BAM SortedByCoordinate --limitBAMsortRAM 16000000000 --outSAMunmapped Within --twopassMode Basic --outFilterMultimapNmax 1 --quantMode TranscriptomeSAM --outSAMstrandField intronMotif --runThreadN 16 --outFileNamePrefix "2_star/"

Here is the Log.Final.Out Started job on | Dec 24 14:55:02 Started mapping on | Dec 24 23:56:13 Finished on | Dec 25 09:08:01 Mapping speed, Million of reads per hour | 7.46

                      Number of input reads |   68563888
                  Average input read length |   200
                                UNIQUE READS:
               Uniquely mapped reads number |   62225717
                    Uniquely mapped reads % |   90.76%
                      Average mapped length |   199.37
                   Number of splices: Total |   40579909
        Number of splices: Annotated (sjdb) |   40566838
                   Number of splices: GT/AG |   40149129
                   Number of splices: GC/AG |   380060
                   Number of splices: AT/AC |   35737
           Number of splices: Non-canonical |   14983
                  Mismatch rate per base, % |   0.18%
                     Deletion rate per base |   0.01%
                    Deletion average length |   1.82
                    Insertion rate per base |   0.01%
                   Insertion average length |   1.21
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |   0
         % of reads mapped to multiple loci |   0.00%
    Number of reads mapped to too many loci |   4956534
         % of reads mapped to too many loci |   7.23%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |   0.00%
             % of reads unmapped: too short |   1.91%
                 % of reads unmapped: other |   0.11%
                              CHIMERIC READS:
                   Number of chimeric reads |   0
                        % of chimeric reads |   0.00%

I'm guessing it's something to do with available RAM, but I can't for the life of me figure it out.

cat /proc/meminfo

MemTotal:       65978504 kB
MemFree:        23424400 kB
MemAvailable:   63803704 kB
Buffers:        27117588 kB
Cached:         12989668 kB
SwapCached:            0 kB
Active:          5692484 kB
Inactive:       35445240 kB
Active(anon):     469060 kB
Inactive(anon):   630856 kB
Active(file):    5223424 kB
Inactive(file): 34814384 kB
Unevictable:          32 kB
Mlocked:              32 kB
SwapTotal:       2097148 kB
SwapFree:        2097148 kB
Dirty:                72 kB
Writeback:             0 kB
AnonPages:       1030652 kB
Mapped:           385564 kB
Shmem:             69424 kB
Slab:            1138764 kB
SReclaimable:    1067000 kB
SUnreclaim:        71764 kB
KernelStack:       10864 kB
PageTables:        32612 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    35086400 kB
Committed_AS:    3910452 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:     3466428 kB
DirectMap2M:    59392000 kB
DirectMap1G:     5242880 kB

Does anyone have an idea as to what it making STAR go so slow? I SHOULD have plenty of RAM to do the job... Am I crazy??

STAR • 1.6k views
ADD COMMENT
0
Entering edit mode

You are probably trying to use too many threads with 64G of RAM. Can you try a lower number of threads (say 8) and see if that helps.

ADD REPLY
0
Entering edit mode

I would monitor the job using something like top to see whether it is running or just sleeping due to some I/O problems.

ADD REPLY
0
Entering edit mode

It does ultimately do the job...so it is doing it. Just excruciatingly slow. It's weird becasue it works fine using

--genomeFastaFiles genome/GRCm38.primary_assembly.genome.fa  and   --sjdbGTFfile 
genes/Mus_musculus.GRCm38.100.gtf but not with --genomeFastaFiles genome/GRCm38.primary_assembly.genome.fa and --sjdbGTFfile genes/gencode.vM25.annotation.gtf

Here is the top output-

top - 08:34:22 up 1 day, 23:57,  1 user,  load average: 8.64, 5.57, 2.50
Tasks: 331 total,   2 running, 219 sleeping,   2 stopped,   0 zombie
%Cpu(s): 47.1 us,  0.4 sy,  0.0 ni, 52.4 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
**KiB Mem : 65978504 total,   557332 free, 29231604 used**, 36189568 buff/cache
KiB Swap:  2097148 total,  2097148 free,        0 used. 35895876 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

**10753 robert    20   0 27.939g 0.025t   8844 R 752.9 41.0  33:31.40 STAR**

1946 robert    20   0 4403620 253816  56416 S   5.9  0.4  53:57.72 cinnamon

11452 robert    20   0   44208   4100   3376 R   5.9  0.0   0:00.01 top

1 root      20   0  225612   6984   4360 S   0.0  0.0   0:05.68 systemd

2 root      20   0       0      0      0 S   0.0  0.0   0:00.06 kthreadd

4 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 kworker/0:+

6 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 mm_percpu_+
ADD REPLY

Login before adding your answer.

Traffic: 1898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6