Question

Loop over .sai files coupled with .fastq.gz

1

Entering edit mode

5.3 years ago

Bnf83 ▴ 150

Hi guys,

I'm totally new in the field of ChipSeq histon acetylation data analysis.

I have to analyse some ChipSeq data. As from many tutorials and on-line documentation I started generating .fastq.gz from raw sequenced data. Then the trimming with Trimmomatic and then the alignment using BWA align.

After bwa align I ended up with .sai files each for each lane, i.e. each for each .fastq.gz file.

The situation is this:

X1_10_S8_L001_R1_001.aligned.sai     
X1_10_S8_L001_R1_001.fastq.gz
X2_11_S8_L001_R1_001.aligned.sai     
X2_11_S8_L001_R1_001.fastq.gz
X3_13_S8_L001_R1_001.aligned.sai     
X3_13_S8_L001_R1_001.fastq.gz

I have thousands of files like this.

I would like to create .sam files to finally generate .bam files.

Is there a way to loop over all the "paired" files to generate the .sam files and then the bam files?

Thank you in advance for your help

B.

ChIP-Seq unix • 1.7k views

ADD COMMENT • link updated 5.3 years ago by Arup Ghosh 3.2k • written 5.3 years ago by Bnf83 ▴ 150

0

Entering edit mode

~~.sai are index files, if I recall correctly. You should also have the actual SAM files that these files are an index of. Could you edit your question and add the BWA command(s) that you used please?~~

Looks like I was super mistaken (ref: Bwa What Is In .Sai File )

If you're just looking for a loop, you should be able to find primers from searching the site. What is your sai -> sam command for one set of input files? Once you define that, you should be able to frame a loop with a tiny bit of effort.

ADD REPLY • link 5.3 years ago by Ram 43k

0

Entering edit mode

Dear Ram, I would like just to perform this: bwa sampe <in.db.fasta> <in1.sai> <in1.fq> > <out.sam> for the full list of "paired" *.sai, *.fastq.gz files.

ADD REPLY • link 5.3 years ago by Bnf83 ▴ 150

1

Entering edit mode

Do the following exercise:

Write down the exact command for 3-4 sets of input files
Observe exactly what changes in the command text in each line
Try writing a loop that prints out each command accurately, and address challenges individually (such as picking the right prefix for the input and output files)

This is a bash loop question, so resources such as https://wiki.bash-hackers.org/syntax/pe should be really helpful.

ADD REPLY • link 5.3 years ago by Ram 43k

score 2 · Accepted Answer · 2019-01-22

2

Entering edit mode

5.3 years ago

Arup Ghosh 3.2k

If your system supports parallel the try the following command with some modifications in the file name pattern.

parallel --verbose -j 20 'bwa mem -t 10  bwa_index {1}_1.fastq.gz {1}_2.fastq.gz|samtools sort -@5 -o {1}_sorted.bam -' ::: $(cat files.txt)

Change the number after -j to increase or decrease the number of files to be processed per batch.

files.txt contains all the file names.

Without parallel.

#!/bin/bash
for file in `cat files.txt`
do
bwa mem -t 10  bwa_index ${file}_1.fastq.gz ${file}_2.fastq.gz|samtools sort -@5 -o ${file}_sorted.bam -
done

*BWA Version: 0.7.17-r1194-dirty

ADD COMMENT • link 5.3 years ago by Arup Ghosh 3.2k

0

Entering edit mode

Unfortunately parallel is not supported. I checked now :-(

ADD REPLY • link 5.3 years ago by Bnf83 ▴ 150

0

Entering edit mode

parallel is not supported

by whom?

ADD REPLY • link 5.3 years ago by Ram 43k

0

Entering edit mode

I'm working on a computer cluster

ADD REPLY • link 5.3 years ago by Bnf83 ▴ 150

1

Entering edit mode

Talk to the sysadmin, I'm sure parallel is just a module load parallel away.

ADD REPLY • link 5.3 years ago by Ram 43k