Question: Using STAR mapping multiple files get loop issue
0
gravatar for lingziqi8278
9 days ago by
lingziqi82780 wrote:

Hello guys, Recently, I using STAR to map reads with multiple files ,here is the script:

 for NAME in individual1 individual2  individual3
do
     STAR --runMode alignReads \
              --runThreadN 10 \
              --genomeDir $REF \
              --readFilesIn ${INPUT}/${NAME}_input/${NAME}_input_R1.fq.gz \
              --readFilesCommand zcat \
              --outSAMstrandField intronMotif  \
              --outFileNamePrefix ${OUT}/${NAME}_input_wasp \
              --outSAMtype BAM Unsorted \
              --varVCFfile ${OUTPUT}/${NAME}_input.vcf \
              --waspOutputMode SAMtag \
              --outSAMattributes vA vG
 done

PATH is right for sure . The key problem is when it get one file done, it stop. NO warning at all. When I type "ps" , is shows like this.

  PID TTY          TIME CMD
19335 pts/0    00:00:00 bash
19384 pts/0    00:00:00 bash
19665 pts/0    00:36:35 STAR
19668 pts/0    00:00:00 sh <defunct>
19708 pts/0    00:00:00 ps

Only when I type ''kill 19665 '' , the next file can be processed . I have no idea about this issue, this confuse me a lot . Could anyone tell me how to fix it? THANK YOU !

ADD COMMENTlink modified 9 days ago by caggtaagtat600 • written 9 days ago by lingziqi82780

See my suggestion for a simple parallelization script (for bowtie2 but I think you'll get the idea) A: perl script for BWA-mem on multiple different files

ADD REPLYlink written 9 days ago by ATpoint16k

Thanks ! It seem useful , I will try in my code .

ADD REPLYlink written 9 days ago by lingziqi82780

do ${OUT}/ and ${OUTPUT}/ exist before you run STAR?

How do you define $OUT and $OUTPUT?

ADD REPLYlink written 9 days ago by Friederike4.1k

It just like this

OUTPUT=/safedisk/CHIP_Seq/PhaseI/5_platypus_vcf
OUT=/safedisk/CHIP_Seq/PhaseI/6_wasp_bam

These two directory represent results of two different step ,${OUT} is where I store my STAR result. By the way ,I test STAR with one single file, "defunct"still happen.

ADD REPLYlink modified 8 days ago • written 8 days ago by lingziqi82780
0
gravatar for caggtaagtat
9 days ago by
caggtaagtat600
caggtaagtat600 wrote:

Hi,

I also execute STAR in a loop and use two differnt ways to get the file names. Either I submit the file names (with the respective paths) to STAR by a document which holds a filename per line:

# For every name in the file
while read SAMPLE; do

# Get single file name
FILEBASE=$(basename "${SAMPLE%.fq.rm_bl}")

# Make new directory for every sample
mkdir /path_to_later/gap_table/$FILEBASE.STAR

# Enter the new directory
cd /path_to_later/gap_table/$FILEBASE.STAR

# Align with STAR 
/path_to_STAR/STAR --outFilterType BySJout --outFilterMismatchNmax 10 --outFilterMismatchNoverLmax 0.04 --alignEndsType EndToEnd --runThreadN 8 --outSAMtype BAM SortedByCoordinate --alignSJDBoverhangMin 4 --alignIntronMax 300000 --alignSJoverhangMin 8 --alignIntronMin 20 --genomeDir /path_to/star_index_hg38_hiv_r100/ --sjdbOverhang 100 --quantMode GeneCounts --sjdbGTFfile/path_to/hg38_pnL43_fusion_annotation.gtf --outFileNamePrefix /path_to/gap_table/$FILEBASE.STAR/ --readFilesIn $SAMPLE > STARaligning.log 

done </path_to_filename_file/filename

Another way would be to search within a directory for certain filenames, to use them subsequently in STAR as input:

Here the first row of the code above is replaced with this 2 lines:

# For every file in the given directory (/path_to_file/), use the filenames showing a ".fq" at the end
find /path_to_files/ -name "*.fq" | while read SAMPLE

# Get single file name
FILEBASE=$(basename "${SAMPLE%.fq}")

I suppose, the extra space between individual2 individual3 is not in the real code? Otherwise, I don't know the reason for the error during your particular kind of loop.

ADD COMMENTlink written 9 days ago by caggtaagtat600

Thanks a lot for answering ! There is no extra space between sample name in real code .I test STAR with single file , I type "ps" ,it look like this :

PID TTY          TIME CMD
29037 pts/1    00:00:00 bash
29088 pts/1    00:00:00 ps

Looking like normal, however, when I type "ps -ef | grep usr_name" .it shows :

28999 28935  0 18:33 pts/0    00:00:00 bash 1_STAR_test.sh
29007 28999 99 18:33 pts/0    00:23:18 STAR --runMode alignReads --runThreadN 10 --genomeDir /home/zhuyl/Genome/susScr11_STAR_update --readFilesIn /safedisk2/lingziqi/phaseI/2019-5-13-36individual/BMX4_Liver_input/BMX4_Liver_input_R1.fq.gz --readFilesCommand zcat --outSAMstrandField intronMotif --outFileNamePrefix /safedisk/09_Encode/CHIP_Seq/PhaseI/BWA_bam/2019-5-13-36individual_lingziqi/6_wasp_bam/BMX4_Liver_input_wasp --outSAMtype BAM Unsorted --varVCFfile /safedisk/09_Encode/CHIP_Seq/PhaseI/BWA_bam/2019-5-13-36individual_lingziqi/platypus_vcf/BMX4_Liver_input.vcf --waspOutputMode SAMtag --outSAMattributes vA vG
29010 29007  0 18:33 pts/0    00:00:00 [sh] <defunct>`

I guess maybe it is not about loop , it just STAR can't exit normally when it get job done ? Have you ever met this issue before ?

ADD REPLYlink modified 9 days ago • written 9 days ago by lingziqi82780

No sry never. Are you sure, you provided the 30GB RAM you need for aligning with STAR?

ADD REPLYlink written 8 days ago by caggtaagtat600

yes, total RAM is 60GB . Anyway, Thanks for helping me . ^o^

ADD REPLYlink written 6 days ago by lingziqi82780
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2134 users visited in the last hour