For loop for aligning multiple paired end fastq files
1
0
Entering edit mode
16 months ago
kcarey • 0

Hey all,

Could you advise what I am doing wrong?

Files looks like this:

OV63.FCD1U51ACXX_L7_IACTTGA.fastq_R1.fastq
OV63.FCD1U51ACXX_L7_IACTTGA.fastq_R2.fastq

I tried this code to align to reference genome:

#Aligning to reference genome
for f in *.fastq | rev | cut -c 10- | rev | uniq
do
STAR --genomeDir ~/file/path/Indexes/ncbi-genomes-2022-09-19/
--runThreadN 20
--readFilesIn ${f}_R1.fastq {f}_R2.fastq --outSAMtype BAM Unsorted --outReadsUnmapped Fastx
--outFileNamePrefix AlignedHG38/${f}
done

I get these errors:

-bash: syntax error near unexpected token `|'
EXITING because of fatal input ERROR: could not open readFilesIn=Read1
Nov 29 13:25:06 ...... FATAL ERROR, exiting
star • 1.9k views
ADD COMMENT
0
Entering edit mode
16 months ago
Ram 43k

The shell stops processing your pipe at *.fastq. If you need to use something like the old version of my bash loop command, you need to use a subshell to do the piped command: $(ls *.fastq | rev | cut -c 10- | rev | uniq) instead of simply *.fastq | rev | cut -c 10- | rev | uniq.

Instead of using the hardcoded cut -c 10- (which will definitely fail in your use case), adapt the updated $(ls *R1*.fastq | sed -r 's/_R1_001[.]fastq//' | uniq) - your FASTQ files don't have the _001 part. Again, for details, see my older post.

ADD COMMENT
0
Entering edit mode

Thanks for your response! I read your old version link and I understand why sed is better. I still am confused how do you add in the R1 and R2, I am trying to align it to a reference but this would only take in to account R1 right? I am using STAR aligner. for the STAR --readfilesin I must list both R1 and R2

How would I also add in R2?

#Aligning to reference genome

for f in $(ls *R1*.fastq | sed -r 's/_R1[.]fastq//' | uniq) 
do
  STAR --genomeDir ~/file/path/Indexes/ncbi-genomes-2022-09-19/
--runThreadN 20
--readFilesIn ${f}_R1.fastq {f}_R2.fastq --outSAMtype BAM Unsorted --outReadsUnmapped Fastx
done
ADD REPLY
0
Entering edit mode

I think there's either another gap in understanding or a typo: you're using the R1 file variable right (${f}_R1.fastq) but are missing the $ sign in the R2 (it should be ${f}_R2.fastq, not {f}_R2.fastq).

Do you understand (a) what the $ does, and (b) what the {} after the $ are for?

ADD REPLY
0
Entering edit mode

Yes I see I was missing that. I didn't even catch it. What hours at the computer will do to the mind...

The $ sign tells us to call that variable correct? Similar to 3x+5 type of thing? Why the function of x is....in this case would be this $(ls *R1*.fastq | sed -r 's/_R1[.]fastq//' | uniq) ....that's how I understood it...

ADD REPLY
0
Entering edit mode

$ sign tells us to call that variable

Close. $ tells the interpreter to look for a variable whose name is the sequence of characters following the $. So if your variable were to be named my_var (example: my_var=ABCDE_L001, then the shell would replace $my_var with ABCDE_L001 wherever it encounters $my_var.

However, if you were to use something like $my_var_R1, the shell would replace that with an empty string, as it doesn't know a variable named my_var_R1 and it does not know that you intend to append _R1 to $my_var. This is where the {} come into play. They "isolate" the variable name from the rest of the string, and using them is a good practice in general. For example, the shell would understand ${my_var}_R1 and you'd see ABCDR_L001_R1.

For even more info, see my comment from that thread: bash loop for alignment RNA-seq data

ADD REPLY
0
Entering edit mode

Ahhh I see. Okay, that makes sense. Its the prefix.... okay cool. So I did refer to the thread but still not sure what I am missing. I attempted to also add in a echo command of the STAR alignment because the issue is no longer about being able to obtain the correct file. I was able to do that by practicing with a few files. For example:

#check how it will run on files with the full code
for f in $(ls *R1*.fastq | sed -r 's/_R1[.]fastq//' | uniq)
do
  echo STAR --genomeDir ~/file/path/Indexes/ncbi-genomes-2022-09-19/
  echo --runMode alignReads
  echo --readFilesIn ${f}_R1.fastq ${f}_R2.fastq
  echo --runThreadN 30
  echo --quantMode TranscriptomeSAM GeneCounts
  echo --twopassMode Basic
  echo --chimOutType Junctions
  echo --outSAMtype BAM Unsorted
  echo --outFileNamePrefix ../starAlignedHG38/${f}
  echo --outSAMtype BAM Unsorted
  echo --outReadsUnmapped Fastx
done

It gave me this:

STAR --genomeDir /file/path/Indexes/ncbi-genomes-2022-09-19/ 
--runMode alignReads 
--readFilesIn OV88.FCC1PU8ACXX_L3_IGGCTAC.fastq_R1.fastq OV88.FCC1PU8ACXX_L3_IGGCTAC.fastq_R2.fastq 
--runThreadN 30 
--quantMode TranscriptomeSAM GeneCounts 
--twopassMode Basic 
--chimOutType Junctions
 --outSAMtype BAM Unsorted 
--outFileNamePrefix ../starAlignedHG38/OV88.FCC1PU8ACXX_L3_IGGCTAC.fastq_
 --outSAMtype BAM Unsorted
 --outReadsUnmapped Fastx 

STAR --genomeDir /file/path/Indexes/ncbi-genomes-2022-09-19/ 
--runMode alignReads 
--readFilesIn OV09.fastq_R1.fastq OV89.fastq_R2.fastq
 --runThreadN 30
 --quantMode TranscriptomeSAM GeneCounts
 --twopassMode Basic 
--chimOutType Junctions 
--outSAMtype BAM Unsorted 
--outFileNamePrefix ../starAlignedHG38/OV89.fastq_
 --outSAMtype BAM Unsorted 
--outReadsUnmapped Fastx

Which is what I would want to see.... It is still giving me the error of it can read the ReadFilesIn tab, saying it can not read it and I am confused.... I've followed you all way of doing it and checked it with echo

ADD REPLY
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.
code_formatting

ADD REPLY
0
Entering edit mode

That is odd. Can you copy-paste the exact error message again, please? The smallest of differences in the error message can help us crack the problem.

EDIT:

Try running the command without breaking it into multiple lines. You're not using escape sequences and STAR is probably reading only the first parameter, stopping at the newline after it.

ADD REPLY
0
Entering edit mode

Thank you for your support, it is finally working

ADD REPLY
0
Entering edit mode

What did you have to do to get it working?

ADD REPLY

Login before adding your answer.

Traffic: 2559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6