Aligning fastq files using STAR
0
0
Entering edit mode
9 weeks ago
kcarery • 0

Hello everyone,

I am still lost trying to align my fastq files that look like this:

OV63.fastq_R1.fastq
OV63.fastq_R2.fastq

Error message keeps coming up:

-bash: syntax error near unexpected token `|'
EXITING because of fatal input ERROR: could not open readFilesIn=Read1
Nov 29 13:25:06 ...... FATAL ERROR, exiting

This is current code I am using:

Aligning to reference genome

for f in $(ls *R1*.fastq | sed -r 's/_R1[.]fastq//' | uniq)
     do 
STAR --genomeDir ~/FinalMayodownloads/Matched/Fastq/Indexes/ncbi-genomes-2022-09-19/
--runThreadN 20
--readFilesIn /file/path/Fastq/{f}_R1.fastq /file/pathFastq/{f}_R2.fastq  --outSAMtype BAM Unsorted --outReadsUnmapped Fastx

done

Also, will the bam files that are put out actually have the same file names?

RNA loop STAR HG Sequencing • 462 views
ADD COMMENT
1
Entering edit mode

Maybe you should do this in something like perl? So you can get the list of files, parse out individual file names you want, pull out the name you want the output to have, all in separate steps that you can troubleshoot?

ADD REPLY
0
Entering edit mode

Ohhhh I have never used this. I will look into it. I thought Perl was a language?

I am running these commands on my terminal on a server stored at my university

ADD REPLY
0
Entering edit mode

Yes... perl is a language.

You can make a perl script to get the file names in the directory, parse out what you need from those names, build the command line you want, then run it with system.

But you can do each step one at a time, instead of cramming it all into a single line. And you can have perl stop at every point and print out your variables, so you know if you've built your regex wrong, or if you've put together your command line wrong.

ADD REPLY
1
Entering edit mode

You could use perl, as @swbarnes2 suggests, but that would involve learning perl, or you could stick with bash, but either way, you'll still have to work it out a step at a time. Start with the simplest thing, and then add complexity. Are you able to simply capture the variables you want and echo them back? Your code works for me:

# create some test files
touch foo_R1.fastq
touch foo_R2.fastq
touch foo1_R1.fastq
touch foo1_R2.fastq
touch foo2_R1.fastq
touch foo2_R2.fastq

execute the script (with Pierre's correction of ${f})

#!/bin/bash

for f in $(ls *R1*.fastq | sed -r 's/_R1[.]fastq//' | uniq)
do
    echo $f
    echo "Fastq/${f}_R1.fastq Fastq/${f}_R2.fastq"
done

Result:

foo1
Fastq/foo1_R1.fastq Fastq/foo1_R2.fastq
foo2
Fastq/foo2_R1.fastq Fastq/foo2_R2.fastq
foo
Fastq/foo_R1.fastq Fastq/foo_R2.fastq

If that much is working for you, you can try adding your complex command (still using echo to see that it is all correct, if you like). (1) If you break your command over several lines, make sure to add \ at the end of each line to carry it to the next line. (2) Make sure you editor - whatever you're using, isn't changing characters, like back ticks versus single quotes, or hidden line returns, etc.

ADD REPLY
0
Entering edit mode

Hey! Thank you so much for your help. You must be a teacher, if not, you would be an excellent one!

I actually tried to copy what you did and I got the same result! So now I am going to try to cp six of my files to another folder and try to complete the compand line by line before I do them all. I will report what I get !

Also, I don't use an editor...is there a recommended one? I have been writing my code on Evernote page I have up.

Thanks

ADD REPLY
2
Entering edit mode

Thank you for the compliment! There could be a couple of things going on. Your echo suggests that the complete command is being properly assembled into a single line, but then fails while trying to execute. The error of "EXITING because of fatal input ERROR: could not open readFilesIn=Read1" might change it's appearance depending on how things are failing. Again, simplification can help. All STAR needs to run is an index and a single input file. So to test your situation, the command:

STAR --genomeDir ~/FinalMayodownloads/Matched/Fastq/Indexes/ncbi-genomes-2022-09-19/ --readFilesIn OV88.FCC1PU8ACXX_L3_IGGCTAC.fastq_R1.fastq

should work (all one line). Especially if that file is in the directory in which you are running your script. You can even insert that line into the script, and it should execute (as long as the file is in your current working directory). It's ok if it's just a single read file, this is just a test. And you can make the test easier if you make a toy file by trying to align just a few reads (any multiple of 4): as in:

head -100 OV88.FCC1PU8ACXX_L3_IGGCTAC.fastq_R1.fastq > test_R1.fastq

This would create a file with 100 lines from your fastq file (equivalent to 25 reads).

Anyway, the "readFilesIn=" error will change depending on two things: if the script is failing because STAR can't find the file it will tell you the name of the file it can not find. e.g. "ERROR: could not open readFilesIn=OV88.FCC1PU8ACXX_L3_IGGCTAC.fastq_R1.fastq". On the other hand, in the example near the top of this thread, your command appears to have been broken into multiple lines, so STAR was not even getting any file to try to open, and thus complained with "could not open readFilesIn=Read1". So the things I would try are (1) create a small toy fastq file and use that to prove you can run a STAR command. (2) make sure the file names you are creating, match the names of the actual files you hand to STAR. (3) Make sure all the file locations/pathways make sense. (4) Make sure the error you are getting does or does not list a particular file (readFilesIn=Read1 means STAR is never seeing a file name, readFilesIn=somefilename.fq means STAR is seeing a file name but can't find it, or is failing for some other reason).

ADD REPLY
0
Entering edit mode

So I did line by line and it echo'ed this

STAR --genomeDir /file/path/Indexes/ncbi-genomes-2022-09-19/
--runMode alignReads
--readFilesIn OV88.fastq_R1.fastq OV88.fastq_R2.fastq
--runThreadN 30
--quantMode TranscriptomeSAM GeneCounts
--twopassMode Basic
--chimOutType Junctions
--outSAMtype BAM Unsorted
--outFileNamePrefix ../starAlignedHG38/OV88.FCC1PU8ACXX_L3_IGGCTAC.fastq_
--outSAMtype BAM Unsorted
--outReadsUnmapped Fastx

STAR --genomeDir /file/path/Indexes/ncbi-genomes-2022-09-19/
--runMode alignReads
--readFilesIn OV89.fastq_R1.fastq OV89.fastq_R2.fastq
--runThreadN 30
--quantMode TranscriptomeSAM GeneCounts
--twopassMode Basic
--chimOutType Junctions
--outSAMtype BAM Unsorted
--outFileNamePrefix ../starAlignedHG38/OV89.fastq_
--outSAMtype BAM Unsorted
--outReadsUnmapped Fastx

Which is what I would want to see.... It is still giving me the error of it can read the ReadFilesIn tab and I am confused....

ADD REPLY
0
Entering edit mode

why another question after For loop for aligning multiple paired end fastq files ? why not using read like I did ? what was the output with echo ?

/{f}_R should be /${f}_R

ADD REPLY
0
Entering edit mode

not sure what you mean by using Read... It gave me the same error when I put it in.

ADD REPLY

Login before adding your answer.

Traffic: 3170 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6