Question: shell script for bowtie/bwa alignment pair end reads
3
gravatar for Paul
4.3 years ago by
Paul1.3k
European Union
Paul1.3k wrote:

Dear all,

I would like to write shell script to automatic alignment my pair end data with bowtie (or bwa) aligners.

Problem is, when I have lot of pair-end fastq files in one directory I have problem to loading data in for cycle.

Here is my code:

for i in $(ls *.fastq | rev | cut -c 22- | rev | uniq)
    do ./bowtie2 --very-sensitive -p32 --rg-id ${i} --rg-sample ${i} --rg-library Illumina --rg-platform Illuimna -x $reference -1 ${i}_L001_R1_001.fatsq.gz -2 ${i}_L001_R2_001.fatsq.gz -S ${i%.fastq.gz}${i}.sam
    done;

This is very nice solution help solved by RamRS. Is there any other way how to treat Bowtie or BWA with many samples (corresponding one sample to each read) in one for cycle?

my directory contains lot of fastq.gz:

name1_R1_001.fatsq.gz
name2_R2_001.fatsq.gz
...
...
...

Each sample name (prefix before R._001.fastq.gz) has different name length so algorithm above doesn't work properly :-(

Thank you for any idea and solutions..

bowtie bwa shell fastq illumina • 9.9k views
ADD COMMENTlink modified 23 months ago by emad.albarouki0 • written 4.3 years ago by Paul1.3k
1

Hi Paul,

Not sure if you have this sorted but I managed to manipulate RamRS's bash to run for Bowtie2

#!bin/bash

for i in $(ls *.fastq | rev | cut -c 14- | rev | uniq)

do 

bowtie2 --very-sensitive -p16 --rg-id ${i} -x cprefseqs -1 ${i}_R1_001.fastq -2 ${i}_R2_001.fastq  -S $i.sam

done

Not sure if this is exactly what you want but it gives the output I'm after. I'm by no means an expert in the btw, it just worked for me. The format of my input files are like this : 65_S155_L001_R1_001.fastq

Cheers,

Todd

ADD REPLYlink written 4.2 years ago by todd.mclay10

I assume you mean "one sample per file/pair of files" rather than "one sample to each read".

What have you tried other than that solution from RamRS and do you understand what it's doing? Once you know how that solution works, you should be able to do this without any help.

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Devon Ryan89k

I don't understand very well what is exactly your question. Are you trying to run BWA/Bowtie for each pair of samples (assuming PE) in a folder? So, the problem is that you are not able to get the files two by two?

ADD REPLYlink written 4.3 years ago by iraun3.5k

Presumably you meant to reply comment on the original post, not my comment :) I've moved it accordingly.

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Devon Ryan89k

@airan: This might provide some context: bash loop for alignment RNA-seq data

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by RamRS21k

Hi, this script depends not on the prefix length but only on the length of the "R[12]_001.fastq" part, which is constant (13). What is the problem you face, exactly?

Also, ${MY_VARIABLE_NAME} is the syntax to encapsulate a variable name; you're looking for the  $(MY_COMMAND_OR_PIPE) syntax, to execute a command or pipe.

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by RamRS21k
3
gravatar for Zaag
4.3 years ago by
Zaag720
Amsterdam
Zaag720 wrote:
for i in `ls *.fastq | cut -d "_" -f 1` ;
    do ./bowtie2 --very-sensitive -p 32 --rg-id $i --rg-sample $i --rg-library Illumina --rg-platform Illuimna -x $reference -1 $i\_L001_R1_001.fastq.gz -2 $i_L001_R2_001.fastq.gz -S $i.sam ;
done ;

edit: removed useless grep

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Zaag720
1

There's no reason to grep the output of ls. Just ls *.fastq.

ADD REPLYlink written 4.3 years ago by Devon Ryan89k
1

OP has made a copy-paste mistake. My original script bash loop for alignment RNA-seq data has $(COMMAND), not ${COMMAND}. $(COMMAND) and `COMMAND` are equivalent. $(COMMAND) IMO avoids quoting confusion.

ADD REPLYlink written 4.3 years ago by RamRS21k
0
gravatar for emad.albarouki
24 months ago by
emad.albarouki0 wrote:

Hello, I tried the above-indicated code after modification according to my sequences, I always get an error. bowtie Error!

Error: Encountered internal Bowtie 2 exception (#1)
(ERR): bowtie2-align exited with value 1

the code is like that:

for i in $ls *.fastq ; do bowtie2 --very-sensitive-local -p 16 --rg-id ${i} --rg sample ${i} -x $index/*.bt2 -1 ${i}_*_1P.fastq -2 ${i}_*_2P.fastq -U ${i}_*_U.fastq -S ${i}-H2-7.sam ; done

my sequence are already trimmed and have a uniq prefix and ending like

NG-9974_*_**1**_1P.fastq
NG-9974_*_**1**_2P.fastq
NG-9974_*_**1**_1U.fastq  # for trimmed unpaired
NG-9974_*_**1**_2U.fastq # for trimmed unpaired

it is also like that (5 or 6, 8 instead of 1 as this)

NG-9974_*_**5**_1P.fastq
NG-9974_*_**5**_2P.fastq
NG-9974_*_**5**_1U.fastq  # for trimmed unpaired
NG-9974_*_**5**_2U.fastq # for trimmed unpaired

I also tried to rev and cut the sequence name as it was suggested, it always gave me the indicated error, could somebody help please to improve or correct my code.

Thank you in advance, Emad

ADD COMMENTlink modified 24 months ago by RamRS21k • written 24 months ago by emad.albarouki0
2

Try the same loop but add echo in front of the command, to check how it looks like without running the actual alignment. As such you can spot errors.

ADD REPLYlink written 24 months ago by WouterDeCoster38k

Thanks Wouter and Ram for your reply, it looks like the bowtie loop does not take the right paired sequence NG-9974__1_1P.fastq and NG-9974__1_2P.fastq as -1 and -2 paired sequences, and the unpaired sequences NG-9974__1_1U.fastq and NG-9974__1_2U.fastq as unpaired under -U arguments rather than take the individual sequence like NG-9974__1_1P.fastq and add the suffixes __1P.fastq, __2P.fastq, and __1U.fastq after the -1, -2 and -U arguments like this:

-1 NG-9974__1_1P.fastq__1P.fastq -2 NG-9974__1_1P.fastq__2P.fastq -U NG-9974__1_1P.fastq__1U.fastq

and

-1 NG-9974__1_2P.fastq__1P.fastq -2 NG-9974__1_2P.fastq__2P.fastq -U NG-9974__1_2P.fastq__1U.fastq

and

-1 NG-9974__2_1U.fastq__1P.fastq -2 NG-9974__2_1U.fastq__2P.fastq -U NG-9974__2_1U.fastq__1U.fastq

I could solve this problem by cutting the last 10 character of the sequences, but the problem still exists with bowtie error, although the alignment pairs and single unpaired looks good now!

ADD REPLYlink modified 23 months ago by RamRS21k • written 23 months ago by emad.albarouki0

Is there a verbose option to bowtie? Does it write a log?

ADD REPLYlink written 23 months ago by RamRS21k

bowtie alignment itself does have a verbose option, it has just in bowtie-build and bowtie-inspector options. Anyway, I could dig out the problem and it looks that the $ sign in front of the index argument made all the problems. It works now, thank you again! :)

ADD REPLYlink written 23 months ago by emad.albarouki0
2

Your bash loop looks for a ${i}_*_U.fastq file, but you have ${i}_*_1U.fastq and ${i}_*_2U.fastq files.

ADD REPLYlink written 24 months ago by RamRS21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1209 users visited in the last hour