Loop tophat to align multiple RNA-seq files
3
1
Entering edit mode
7.9 years ago
mmrcksn ▴ 50

Hello,

I'm a beginner! trying to use a loop to align multiple paired-end RNA-seq samples with tophat.

Here's what I have that isn't working (basically using this bash loop for alignment RNA-seq data a little bit modified):

for i in $(ls *.fastq | rev | cut -c 13- | rev | uniq) 
do
tophat -o /path/to/output/${i} -G /path/to/genes.gtf /path/to/index/genome ${i}R1_001.fastq ${i}R2_001.fastq
done

The error is this:

#IOError: [Errno 2] No such file or directory: 'R1_001.fastq'

Why doesn't it realize that ${i} is part of the filename?

Thanks in advance.

Emma

RNA-Seq tophat loop alignment • 4.5k views
ADD COMMENT
1
Entering edit mode

Can you show us example file names for one sample?

ADD REPLY
0
Entering edit mode

Sure, here's an example:

C-CD-15_S14_R1_001.fastq
C-CD-15_S14_R2_001.fastq
ADD REPLY
1
Entering edit mode

See if this helps

for i in $(ls *.fastq | rev | cut -c 13- | rev | uniq) 
do
tophat -o /path/to/output/${i}out -G /path/to/genes.gtf /path/to/index/genome ${i}R1_001.fastq ${i}R2_001.fastq
done
ADD REPLY
1
Entering edit mode

Thanks for the suggestion! Still get the same error though.

ADD REPLY
1
Entering edit mode

Curious about what OS/shell are you using?

ADD REPLY
0
Entering edit mode

can you just try

for i in $(ls *.fastq | rev | cut -c 13- | rev | uniq) 
do
echo "${i}R1_001.fastq -- ${i}R2_001.fastq"
done

to see what it prints out and can you also try

ls -lh | head

to see whether the output from above matches the file names?

edit...I just realized that it piggy backs on Igor's answer :)

edit2...try also to use the whole path for your fastq files, i.e.:

/path/to/my.files/${i}R1_001.fastq
ADD REPLY
2
Entering edit mode
7.9 years ago
igor 13k

Add echo $i to your loop so you can see what $i is. Maybe that command is not returning what you think it should.

ADD COMMENT
1
Entering edit mode

Thanks for the suggestion. I just tried that and it is returning what it should.

ADD REPLY
1
Entering edit mode

What happens when you echo ${i}R1_001.fastq?

ADD REPLY
1
Entering edit mode

Also, try printing the whole command: echo "tophat -o /path/to/output/${i}out -G /path/to/genes.gtf /path/to/index/genome ${i}R1_001.fastq ${i}R2_001.fastq"

ADD REPLY
1
Entering edit mode
7.9 years ago
TriS ★ 4.7k

if you are aligning fastq files on a server (and I expect you do), then it is much faster to run them in parallel instead of sequentially.

ADD COMMENT
0
Entering edit mode
7.9 years ago
dr_bantz ▴ 110

The error message you get suggests that bash cannot find any fastq files in your current directory, so I guess the first step would be to make sure that the fastq files are in the same directory from which you're running your script. Also make sure they're not actually .fastq.gz

If all the above is legit, you could also try this:

for file in ./*R1_001.fastq

do

FBASE=$(basename $file .fastq)

BASE=${FBASE%R1_001}

tophat -o /path/to/output/${BASE} -G /path/to/genes.gtf /path/to/index/genome ${BASE}R1_001.fastq ${BASE}R2_001.fastq

done
ADD COMMENT

Login before adding your answer.

Traffic: 2741 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6