Hi,
I do have multiple pair end read file for amplicon sequencing, I am trying to make bash
script` to generate manifest file:
manifest file should contain information like this:
Blockquote
*sample-id,filename,direction
a,a_1_R1_001.fastq.gz,forward
b,b_2_R1_001.fastq.gz,forward
c,c_3_R1_001.fastq.gz,forward*
location of file: /mnt/scratch/users/3052771/Amplicon_data_july_2018/16S_Analysis/soil_16s
file name is like this:
Soil-33_S73_L001_R1_001.fastq Soil_9_S42_R1_001.fastq
Soil-33_S73_L001_R2_001.fastq Soil_9_S42_R2_001.fastq
script:
>
echo "sample-id,absolute-filepath,direction" > manifest.csv
> raw_data='/mnt/scratch/users/3052771/Amplicon_data_july_2018/16S_Analysis/soil_16s/'
> *#Since the format asks to separate 'foward' and 'reverse' iterating for R1, then same loop for R2*
for sampleID in $(ls ${raw_data}/*gz |
> cut -d'-' -f2-4 | sort | uniq) do
> path=$(find $raw_data -name "*$sampleID*R1*")
> echo "$sampleID,$path,forward" >> manifest.csv done
> *# Iterating for R2*
for sampleID in $(ls ${raw_data}/*gz | cut -d'-' -f2-4 | sort | uniq) do
> path=$(find $raw_data -name "*$sampleID*R2*")
> echo "$sampleID,$path,reverse" >> manifest.csv done
but it is showing error:
find: warning: Unix filenames usually don't contain slashes (though
pathnames do). That means that '-name
*/mnt/scratch/users/3052771/Amplicon_data_july_2018/16S_Analysis/soil_16s//Soil_9_S42_R2_001.fastq.gz*R2*'' will probably evaluate to false all the time on this system. You might find the '-wholename' test more useful, or perhaps '-samefile'. Alternatively, if you are using GNU grep, you could use 'find ... -print0 | grep -FzZ
/mnt/scratch/users/3052771/Amplicon_data_july_2018/16S_Analysis/soil_16s//Soil_9_S42_R2_001.fastq.gzR2*''.
manifest.csv giving out is like this:
but the output should be like this: