Question: bwa for multiple fastq files
0
gravatar for dmotelb87
17 months ago by
dmotelb8710
dmotelb8710 wrote:

I would like to do bwa annotation for forward and reverse reads from different samples against a reference genome using for loap

I have different samples like:

A2_forward.fastq A2_reverse.fastq A3_forward.fastq A3_reverse.fastq A4_forward.fastq A4_reverse.fastq

Also, I need the output for annotation of each sample in separate, as I need to compare the effects of different treatments in different samples.

bwa aligner • 1.7k views
ADD COMMENTlink modified 17 months ago • written 17 months ago by dmotelb8710
1

Thanks Suraj. Yes. This what I mean. It is working now.

ADD REPLYlink written 17 months ago by dmotelb8710

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLYlink written 17 months ago by WouterDeCoster40k
6
gravatar for MSM55
17 months ago by
MSM55120
Israel
MSM55120 wrote:

Do you men to say, you want to do mapping of different sample against your reference genome ? You can use following script for mapping multiple sample using bwa. (Note: save the below script in .sh format (say "mapping.sh") and make sure that are no other fastq file rather then your sample should be there in your working directory)

script for mapping using bwa mem

total_files=`find -name '*.fastq' | wc -l`
arr=( $(ls *.fastq) )
echo "mapping started" >> map.log
echo "---------------" >> map.log

for ((i=0; i<$total_files; i+=2))
{
sample_name=`echo ${arr[$i]} | awk -F "_" '{print $1}'`
echo "[mapping running for] $sample_name"
printf "\n"
echo "bwa mem -t 12 A1_denovo_assembly.fasta ${arr[$i]} ${arr[$i+1]} > $sample_name.sam" >> map.log
bwa mem -t 12 A1_denovo_assembly.fasta ${arr[$i]} ${arr[$i+1]} > $sample_name.sam
}
ADD COMMENTlink modified 17 months ago • written 17 months ago by MSM55120

Some elegant bash / awk here. +1

ADD REPLYlink written 5 months ago by ThePresident110

I am very new to bash but think this code is exactly what I need (same problem). Would it be possible to step through what the for loop is doing? Thank you

ADD REPLYlink written 8 weeks ago by e.taylorcox0

Follow along using the numbers below as line numbers to the script above.

  1. Counting number of files that end in *.fastq. If your files have different endings then substitute accordingly.
  2. Assigning the names of the files to an array called arr.
  3. Just printing a message to screen and append to log file called map.log.
  4. Just printing a message to screen and append to log file called map.log.
  5. A for loop to step through the array taking two file names at a time (this is a paired-end dataset so forward and reverse pair, will need to change depending on your use case) until you take all elements from array. Array elements start at position [0].
  6. sample_name extracts the actual name of the sample (stuff before first _ e.g. Sample1 from Sample1_fastq)
  7. Printing the name you are working with to screen with echo
  8. Printing the actula command you are working with. Name of output file created using $sample_name as $sample_name.sam.
  9. Run the actual bwa mem command.
ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by genomax69k

How to call input files from a folder with paired end input files for this array?

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by evelyn30

Changing the first two lines to may be all you need:

total_files=`find -name 'folder_name/*.fastq' | wc -l`
arr=( $(ls folder_name/*.fastq) )
ADD REPLYlink written 6 weeks ago by genomax69k

Thanks! But it gives error as Unix filenames usually don't contain slashes

ADD REPLYlink written 6 weeks ago by evelyn30

I tried this

total_files=`find -wholename './sample_data/*.fastq' | wc -l`
arr=( $(ls ./sample_data/*.fastq) )

But then bwa mem runs for two pairs of input files in the folder but output only one file named sample.bam However, I am expecting two output files, A1.bam and B1.bam

ADD REPLYlink written 6 weeks ago by evelyn30

I would try replacing this line instead:

total_files=`ls /sample_data/*.fastq | wc -l`

Keep an eye on

echo "[mapping running for] $sample_name"

to make sure sample names are being printed properly. You should get as many BAM files as there are pairs of sample files.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by genomax69k

So I tried

total_files=`ls ./sample_data/*.fastq | wc -l`
arr=( $(ls ./sample_data/*.fastq) )
for ((i=0; i<$total_files; i+=2))
{
sample_name=`echo ${arr[$i]} | awk -F "_" '{print $1}'`
echo "[mapping running for] $sample_name"
printf "\n"
bwa mem -t 12 ref.fa ${arr[$i]} ${arr[$i+1]} | samtools sort -o output/$sample_name.sorted.bam

Echo says

[mapping running for] ./sample

And got one sorted.bam file not with the name of input file but as sample.sorted.bam

It is reading sample as a file name which is a problem.

ADD REPLYlink written 6 weeks ago by evelyn30

It looks like your samples names don't follow the format (Sample_R1.fastq) expected above. What do they look like?

ADD REPLYlink written 6 weeks ago by genomax69k

My sample names are different

1_R1.fastq 1_R2.fastq 2_R1.fastq 2_R2.fastq

But the same script runs if the same samples are in the directory itself rather than a folder in the same directory. But I want to have my input files in the folder and it does not work.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by evelyn30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 713 users visited in the last hour