STAR multiple FASTQ alignment, can you use with a variable ? 0 0 Entering edit mode 18 months ago Super basic question, probably more related to programming than RNA-seq itself... I have multiple FASTQs that I want to align (from 3 separate lanes to provide more sequencing depth). According to the STAR manual, I should be able to separate these with commas in between all read 1s, then space, then commas between all read 2s. Like this: L1_R1,L2_R1,L3_R1 L1_R2,L2_R2,L3_R2 I keep getting the error: gzip: /Sample_1/fastq/*L003_001.R2.fastq.gz: No such file or directory When I bring it out to the terminal to check, this is what I see: echoFASTQ1 $FASTQ2 /Sample_1/fastq/Sample_1_TTAGGC_BCCRUJANXX_L003_001.R1.fastq.gz /Sample_1/fastq/Sample_1_TTAGGC_BCCRUJANXX_L003_001.R2.fastq.gz  but echo$FASTQ1,$FASTQ2 /Sample_1/fastq/*L003_001.R1.fastq.gz,/Sample_1/fastq/*L003_001.R2.fastq.gz  So clearly it recognizes the variable name, but can't deal with the comma in between. Is there a way around this or do I need to provide full variable names in the STAR script (which would be pretty annoying, since I have many files like this)? Thanks! Here's the code for reference (I've modified it slightly since the whole path names are long): IN=/Sample_1 genomeDir=/STAR GTF=/gencode.v19.chr_patch_hapl_scaff.annotation.gtf FASTQ1=$IN/fastq/*L003_001.R1.fastq.gz
FASTQ2=$IN/fastq/*L003_001.R2.fastq.gz FASTQ3=$IN/fastq/*L004_001.R1.fastq.gz
FASTQ4=$IN/fastq/*L004_001.R2.fastq.gz FASTQ5=$IN/fastq/*L005_001.R1.fastq.gz
FASTQ6=$IN/fastq/*L005_001.R2.fastq.gz STAR --runThreadN 8 --genomeDir$genomeDir --sjdbGTFfile $GTF --sjdbOverhang 149 --bamRemoveDuplicatesType UniqueIdentical --readFilesIn$FASTQ1,$FASTQ3,$FASTQ5 $FASTQ2,$FASTQ4,$FASTQ6 --twopassMode Basic --outSAMtype BAM SortedByCoordinate Unsorted --quantMode TranscriptomeSAM GeneCounts --readFilesCommand zcat  RNA-seq STAR Alignment bash variable • 950 views ADD COMMENT 1 Entering edit mode IN=/Sample_1  Defining variable this way is making the OS look for that folder/file at the base of the root directory when that variable is expanded (which is reflected in error you get gzip: /Sample_1/fastq/*L003_001.R2.fastq.gz: No such file or directory. Do you actually have a folder Sample_1 just under /? Other than that what is going on with * in *L003_001.R1.fastq.gz? What is the * supposed to indicate? ADD REPLY 0 Entering edit mode Thanks for your reply! I just cut out the rest of the path name for my question online because it's super long, but I don't think that part of the variable is the issue. I think you are right that the wildcard is causing the problem, but I don't understand why. Every sample is named something like this in each respective fastq folder: Sample_1_TTAGGC_BCCRUJANXX_L003_001.R2.fastq.gz so I wanted to make a universal script that could get rid of the sample name and adapter sequence and take only the part: *L003_001.R2.fastq.gz Doing this to all my scripts seems to have fixed the problem, after I cd into the proper folder: test=Sample_1_TTAGGC_BCCRUJANXX_ FASTQ1=fastq/${test}L003_001.R1.fastq.gz


But this requires me to fill out that variable for every sample, which seems like a silly way to do it. You're right that the wildcard is the issue, but why?

2
Entering edit mode

Try using ${FASTQ1},${FASTQ2} instead of $FASTQ1,$FASTQ2 - I'm not sure if it will work, but it's worth a shot. What would be better is to use something like

--readFilesIn \
$(find Sample1/fastq/ -name "*L00*R1*.fastq.gz" | sort | tr "\n" ",") \$(find Sample1/fastq/ -name "*L00*R2*.fastq.gz" | sort | tr "\n" ",")


or something that uses ls instead of find if all FASTQ are in the same folder:

--readFilesIn \
$(/bin/ls -1 Sample1/fastq/*L00*R1*.fastq.gz | sort | tr "\n" ",") \$(/bin/ls -1 Sample1/fastq/*L00*R2*.fastq.gz | sort | tr "\n" ",")

0
Entering edit mode

Thank you! Yeah the ${FASTQ1},${FASTQ2} doesn't work either, it's something about having the comma right between the two variables makes it break down when using a wildcard. But I'll give the other options a shot, thanks!

1
Entering edit mode

If your file names are Sample_1_TTAGGC_BCCRUJANXX_L003_001.R1.fastq.gz then name=$(basename${i} _001.R2.fastq.gz) would be the best way to extract the part that is variant. Then you reconstitute the file names for actual run using ${name}_001.R1.fastq.gz and ${name}_001.R2.fastq.gz.