Hey all,
Could you advise what I am doing wrong?
Files looks like this:
OV63.FCD1U51ACXX_L7_IACTTGA.fastq_R1.fastq
OV63.FCD1U51ACXX_L7_IACTTGA.fastq_R2.fastq
I tried this code to align to reference genome:
#Aligning to reference genome
for f in *.fastq | rev | cut -c 10- | rev | uniq
do
STAR --genomeDir ~/file/path/Indexes/ncbi-genomes-2022-09-19/
--runThreadN 20
--readFilesIn ${f}_R1.fastq {f}_R2.fastq --outSAMtype BAM Unsorted --outReadsUnmapped Fastx
--outFileNamePrefix AlignedHG38/${f}
done
I get these errors:
-bash: syntax error near unexpected token `|'
EXITING because of fatal input ERROR: could not open readFilesIn=Read1
Nov 29 13:25:06 ...... FATAL ERROR, exiting
Thanks for your response! I read your old version link and I understand why sed is better. I still am confused how do you add in the R1 and R2, I am trying to align it to a reference but this would only take in to account R1 right? I am using STAR aligner. for the STAR
--readfilesin
I must list both R1 and R2How would I also add in R2?
I think there's either another gap in understanding or a typo: you're using the R1 file variable right (
${f}_R1.fastq
) but are missing the$
sign in the R2 (it should be${f}_R2.fastq
, not{f}_R2.fastq
).Do you understand (a) what the
$
does, and (b) what the{}
after the$
are for?Yes I see I was missing that. I didn't even catch it. What hours at the computer will do to the mind...
The
$
sign tells us to call that variable correct? Similar to 3x+5 type of thing? Why the function of x is....in this case would be this$(ls *R1*.fastq | sed -r 's/_R1[.]fastq//' | uniq)
....that's how I understood it...Close.
$
tells the interpreter to look for a variable whose name is the sequence of characters following the$
. So if your variable were to be namedmy_var
(example:my_var=ABCDE_L001
, then the shell would replace$my_var
withABCDE_L001
wherever it encounters$my_var
.However, if you were to use something like
$my_var_R1
, the shell would replace that with an empty string, as it doesn't know a variable namedmy_var_R1
and it does not know that you intend to append_R1
to$my_var
. This is where the{}
come into play. They "isolate" the variable name from the rest of the string, and using them is a good practice in general. For example, the shell would understand${my_var}_R1
and you'd seeABCDR_L001_R1
.For even more info, see my comment from that thread: bash loop for alignment RNA-seq data
Ahhh I see. Okay, that makes sense. Its the prefix.... okay cool. So I did refer to the thread but still not sure what I am missing. I attempted to also add in a echo command of the STAR alignment because the issue is no longer about being able to obtain the correct file. I was able to do that by practicing with a few files. For example:
It gave me this:
Which is what I would want to see.... It is still giving me the error of it can read the ReadFilesIn tab, saying it can not read it and I am confused.... I've followed you all way of doing it and checked it with echo
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.That is odd. Can you copy-paste the exact error message again, please? The smallest of differences in the error message can help us crack the problem.
EDIT:
Try running the command without breaking it into multiple lines. You're not using escape sequences and STAR is probably reading only the first parameter, stopping at the newline after it.
Thank you for your support, it is finally working
What did you have to do to get it working?