Question: STAR multiple FASTQ alignment, can you use with a variable $?
gravatar for science_lizard
7 months ago by
science_lizard0 wrote:

Super basic question, probably more related to programming than RNA-seq itself...

I have multiple FASTQs that I want to align (from 3 separate lanes to provide more sequencing depth). According to the STAR manual, I should be able to separate these with commas in between all read 1s, then space, then commas between all read 2s. Like this: L1_R1,L2_R1,L3_R1 L1_R2,L2_R2,L3_R2

I keep getting the error: gzip: /Sample_1/fastq/*L003_001.R2.fastq.gz: No such file or directory

When I bring it out to the terminal to check, this is what I see:

echo $FASTQ1 $FASTQ2
/Sample_1/fastq/Sample_1_TTAGGC_BCCRUJANXX_L003_001.R1.fastq.gz /Sample_1/fastq/Sample_1_TTAGGC_BCCRUJANXX_L003_001.R2.fastq.gz


echo $FASTQ1,$FASTQ2

So clearly it recognizes the variable name, but can't deal with the comma in between. Is there a way around this or do I need to provide full variable names in the STAR script (which would be pretty annoying, since I have many files like this)?

Thanks! Here's the code for reference (I've modified it slightly since the whole path names are long):



STAR --runThreadN 8 --genomeDir $genomeDir --sjdbGTFfile $GTF --sjdbOverhang 149 --bamRemoveDuplicatesType UniqueIdentical --readFilesIn $FASTQ1,$FASTQ3,$FASTQ5 $FASTQ2,$FASTQ4,$FASTQ6 --twopassMode Basic --outSAMtype BAM SortedByCoordinate Unsorted --quantMode TranscriptomeSAM GeneCounts --readFilesCommand zcat
ADD COMMENTlink written 7 months ago by science_lizard0

Defining variable this way is making the OS look for that folder/file at the base of the root directory when that variable is expanded (which is reflected in error you get gzip: /Sample_1/fastq/*L003_001.R2.fastq.gz: No such file or directory. Do you actually have a folder Sample_1 just under /?

Other than that what is going on with * in *L003_001.R1.fastq.gz? What is the * supposed to indicate?

ADD REPLYlink modified 7 months ago • written 7 months ago by GenoMax96k

Thanks for your reply! I just cut out the rest of the path name for my question online because it's super long, but I don't think that part of the variable is the issue. I think you are right that the wildcard is causing the problem, but I don't understand why. Every sample is named something like this in each respective fastq folder:


so I wanted to make a universal script that could get rid of the sample name and adapter sequence and take only the part:


Doing this to all my scripts seems to have fixed the problem, after I cd into the proper folder:


But this requires me to fill out that variable for every sample, which seems like a silly way to do it. You're right that the wildcard is the issue, but why?

ADD REPLYlink written 7 months ago by science_lizard0

Try using ${FASTQ1},${FASTQ2} instead of $FASTQ1,$FASTQ2 - I'm not sure if it will work, but it's worth a shot. What would be better is to use something like

--readFilesIn \
    $(find Sample1/fastq/ -name "*L00*R1*.fastq.gz" | sort | tr "\n" ",") \
    $(find Sample1/fastq/ -name "*L00*R2*.fastq.gz" | sort | tr "\n" ",")

or something that uses ls instead of find if all FASTQ are in the same folder:

--readFilesIn \
    $(/bin/ls -1 Sample1/fastq/*L00*R1*.fastq.gz | sort | tr "\n" ",") \
    $(/bin/ls -1 Sample1/fastq/*L00*R2*.fastq.gz | sort | tr "\n" ",")
ADD REPLYlink modified 7 months ago • written 7 months ago by Ram32k

Thank you! Yeah the ${FASTQ1},${FASTQ2} doesn't work either, it's something about having the comma right between the two variables makes it break down when using a wildcard. But I'll give the other options a shot, thanks!

ADD REPLYlink written 7 months ago by science_lizard0

If your file names are Sample_1_TTAGGC_BCCRUJANXX_L003_001.R1.fastq.gz then name=$(basename ${i} _001.R2.fastq.gz) would be the best way to extract the part that is variant. Then you reconstitute the file names for actual run using ${name}_001.R1.fastq.gz and ${name}_001.R2.fastq.gz.

ADD REPLYlink modified 7 months ago • written 7 months ago by GenoMax96k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1445 users visited in the last hour