Question: Qiime2 manifest file for multiple pair end read
0
gravatar for Bioinfonext
7 months ago by
Bioinfonext150
Korea
Bioinfonext150 wrote:

Hi,

I do have multiple pair end read file for amplicon sequencing, I am trying to make bash script` to generate manifest file: manifest file should contain information like this:

Blockquote

*sample-id,filename,direction

a,a_1_R1_001.fastq.gz,forward

b,b_2_R1_001.fastq.gz,forward

c,c_3_R1_001.fastq.gz,forward*

location of file: /mnt/scratch/users/3052771/Amplicon_data_july_2018/16S_Analysis/soil_16s

file name is like this:

Soil-33_S73_L001_R1_001.fastq Soil_9_S42_R1_001.fastq

Soil-33_S73_L001_R2_001.fastq Soil_9_S42_R2_001.fastq

script:

>

 echo "sample-id,absolute-filepath,direction" > manifest.csv

> raw_data='/mnt/scratch/users/3052771/Amplicon_data_july_2018/16S_Analysis/soil_16s/'

> *#Since the format asks to separate 'foward' and 'reverse' iterating for R1, then same loop for R2*

 for sampleID in $(ls ${raw_data}/*gz |

> cut -d'-' -f2-4 | sort | uniq) do

>     path=$(find $raw_data -name "*$sampleID*R1*")

>     echo "$sampleID,$path,forward" >> manifest.csv done

> *# Iterating for R2*

 for sampleID in $(ls ${raw_data}/*gz | cut -d'-' -f2-4 | sort | uniq) do

>     path=$(find $raw_data -name "*$sampleID*R2*")
>     echo "$sampleID,$path,reverse" >> manifest.csv done

but it is showing error:

find: warning: Unix filenames usually don't contain slashes (though

pathnames do). That means that '-name */mnt/scratch/users/3052771/Amplicon_data_july_2018/16S_Analysis/soil_16s//Soil_9_S42_R2_001.fastq.gz*R2*'' will probably evaluate to false all the time on this system. You might find the '-wholename' test more useful, or perhaps '-samefile'. Alternatively, if you are using GNU grep, you could use 'find ... -print0 | grep -FzZ/mnt/scratch/users/3052771/Amplicon_data_july_2018/16S_Analysis/soil_16s//Soil_9_S42_R2_001.fastq.gzR2*''.

next-gen • 457 views
ADD COMMENTlink modified 7 months ago • written 7 months ago by Bioinfonext150

manifest.csv giving out is like this:

6_S22_L001_R1_001.fastq.gz,,forward

6_S22_L001_R2_001.fastq.gz,,forward

7_S32_L001_R1_001.fastq.gz,,forward

7_S32_L001_R2_001.fastq.gz,,forward

/mnt/scratch/users/3052771/Amplicon_data_july_2018/16S_Analysis/soil_16s//Soil_10_S52_R1_001.fastq.gz,,forward

/mnt/scratch/users/3052771/Amplicon_data_july_2018/16S_Analysis/soil_16s//Soil_10_S52_R2_001.fastq.gz,,forward

/mnt/scratch/users/3052771/Amplicon_data_july_2018/16S_Analysis/soil_16s//Soil_11_S62_R1_001.fastq.gz,,forward

/mnt/scratch/users/3052771/Amplicon_data_july_2018/16S_Analysis/soil_16s//Soil_11_S62_R2_001.fastq.gz,,forward

but the output should be like this:

Soil_10_S52,/mnt/scratch/users/3052771/Amplicon_data_july_2018/16S_Analysis/soil_16s//Soil_10_S52_R1_001.fastq.gz,forward
ADD REPLYlink modified 7 months ago • written 7 months ago by Bioinfonext150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 969 users visited in the last hour