Question: STAR align multiple files
1
gravatar for ta_awwad
18 months ago by
ta_awwad200
Frankfurt am Main
ta_awwad200 wrote:

Hi everybody, I am doing alignment to 36 PE samples using star. to make it little bit easy task I wrote a bash loop to align them all with the same command. here is my loop:

for i in $(ls raw_data); do STAR --genomeDir index.150 \
--readFilesIn raw_data/$i\_1.fq.gz,raw_data/$i\_2.fq.gz \
--runThreadN 20 --outFileNamePrefix aligned/$i. \
--outSAMtype BAM SortedByCoordinate \
--quantMode GeneCounts \
--sjdbGTFfile GRCm38.90.gtf \
--readFilesCommand zcat ; done

but it seems that something wrong as the alignment took overnight and it was not done yet.

any recommendation

thanks much

rna-seq star chip-seq alignment • 3.3k views
ADD COMMENTlink modified 11 months ago by Bog20 • written 18 months ago by ta_awwad200
4

For 36 samples, you could speed up by loading the index into memory, and unloading when finished mapping:

STAR --genomeLoad LoadAndExit --genomeDir index.150

for i in $(ls raw_data | sed s/_[12].fq.gz// | sort -u)
do
    STAR [...]
done

STAR --genomeLoad Remove --genomeDir index.150
ADD REPLYlink written 18 months ago by h.mon24k

Thank you all for these price less info..

ADD REPLYlink written 18 months ago by ta_awwad200

Hi h.mon,

Could you tell me what is the purpose of index.150 here? Can we just type the location of the genome after --genomeDir?

ADD REPLYlink written 8 weeks ago by chahat_u110
1

Yes. In the example given index.150 is the name of the index that was in the original question. Replace that with yours.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by genomax65k
2

When looping, test if your code is valid by adding an echo statement to see what the command is going to be:

for i in $(ls raw_data); do echo STAR --genomeDir index.150 \
--readFilesIn raw_data/$i\_1.fq.gz,raw_data/$i\_2.fq.gz \
--runThreadN 20 --outFileNamePrefix aligned/$i. \
--outSAMtype BAM SortedByCoordinate \
--quantMode GeneCounts \
--sjdbGTFfile GRCm38.90.gtf \
--readFilesCommand zcat ; done

My guess is that the files raw_data/$i_1.fq.gz don't exist because you create $i simply based on the content of raw_data

ADD REPLYlink written 18 months ago by WouterDeCoster38k

thanks much WouterDeCoster for your reply. I run your code and got this:

STAR --genomeDir /index.150 --readFilesIn raw_data/KO_day3_1_1.fq.gz_1.fq.gz raw_data/KO_day3_1_1.fq.gz_2.fq.gz --runThreadN 20 --outFileNamePrefix aligned/KO_day3_1_1.fq.gz. --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --sjdbGTFfile GRCm38.90.gtf --readFilesCommand zcat

you are right. the file name became different.

any suggestion to correct this??

thanks much

ADD REPLYlink modified 18 months ago • written 18 months ago by ta_awwad200
1

Can you show a few examples of filenames of the fq.gz files?

ADD REPLYlink written 18 months ago by WouterDeCoster38k
KO_day3_1_1.fq.gz           KO_day4_1_2.fq.gz   mESC_KO_3_1.fq.gz  mESC_KO_3_2.fq.gz      mESC_Wt3_1.fq.gz    mESC_Wt3_2.fq.gz        PG_4WT10_07_17_1.fq.gz    PG_4WT10_07_17_2.fq.gz  PG_7Swht16_07_17_1.fq.gz  PG_7Swht16_07_17_2.fq.gz
ADD REPLYlink modified 18 months ago • written 18 months ago by ta_awwad200
2

You could try something like:

for i in $(ls raw_data | sed s/_[12].fq.gz// | sort -u); do echo STAR --genomeDir index.150 \
--readFilesIn raw_data/${i}_1.fq.gz,raw_data/${i}_2.fq.gz \
--runThreadN 20 --outFileNamePrefix aligned/$i. \
--outSAMtype BAM SortedByCoordinate \
--quantMode GeneCounts \
--sjdbGTFfile GRCm38.90.gtf \
--readFilesCommand zcat ; done

I modified the $i to be shorter, and only keep unique hits since all samples will be in there twice.

ADD REPLYlink written 18 months ago by WouterDeCoster38k

Thanks much ... it is running now .. but I am not sure how much time it will take .. I will inform you if everything run fine

ADD REPLYlink modified 18 months ago • written 18 months ago by ta_awwad200

it looks like it is stuck .. no progress since 30 minutes .. is it normal???

ADD REPLYlink modified 18 months ago • written 18 months ago by ta_awwad200

You can have a look with (h)top to see if it's still working. Also, check if it's producing output files.

ADD REPLYlink written 18 months ago by WouterDeCoster38k

I think the problem was that STAR doesn't accept compressed files.

ADD REPLYlink written 18 months ago by ta_awwad200
1

it accepts but you need to specify : --readFilesCommand zcat

ADD REPLYlink written 18 months ago by Nicolas Rosewick7.5k

I did .. and it did not work

ADD REPLYlink written 18 months ago by ta_awwad200

Works just fine for me, use it all the time.

ADD REPLYlink written 18 months ago by WouterDeCoster38k

"it did not work" doesn't help us know what went wrong, what is the error message? STAR does accept gz compressed files.

ADD REPLYlink written 18 months ago by h.mon24k

just stuck no error message no progress

ADD REPLYlink written 17 months ago by ta_awwad200
2

Try gunzip instead. It works with that.

ADD REPLYlink written 11 months ago by Bog20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 925 users visited in the last hour