Question: cutadapt loop and paired-end reads
1
gravatar for sbrown669
13 months ago by
sbrown66920
sbrown66920 wrote:

My aim is to create a bash loop over my fastq files that are in a directory like so: https://ibb.co/cMvx7a whereby, paired read files are organised in the form:

1

1

2

2

3

3

etc..

I don't know a way of dealing with these paired files within the cutadapt loop Something along these lines is what I'm looking for:

for file in folder

do
    cutadapt -a ADAPTER_FWD -A ADAPTER_REV -o out.1.fastq -p out.2.fastq reads.1.fastq reads.2.fastq
done

but I want to do pairs together from the folder and I don't know how.

Thanks in advance.

ADD COMMENTlink modified 4 months ago by montoya.oscar40 • written 13 months ago by sbrown66920
1
gravatar for h.mon
13 months ago by
h.mon15k
Brazil
h.mon15k wrote:

This is not a bioinformatics question, it is more a shell scripting question.

This draft should help you to get started:

for i in *_R1.fastq.gz
do
  SAMPLE=$(echo ${i} | sed "s/_R1\.fastq\.gz//")
  echo ${SAMPLE}_R1.fastq.gz ${SAMPLE}_R2.fastq.gz
done
ADD COMMENTlink written 13 months ago by h.mon15k
0
gravatar for montoya.oscar
4 months ago by
montoya.oscar40 wrote:

Based on h.mod's hint, the following code removes three pairs of primers (three forward and thre reverse) and their reverse complements (simply the same primers sequences but backwards):

for i in *_R1_001.fastq.gz
do
  SAMPLE=$(echo ${i} | sed "s/_R1_\001\.fastq\.gz//") 
  echo ${SAMPLE}_R1_001.fastq.gz ${SAMPLE}_R2_001.fastq.gz
cutadapt -m 10 -O 17 -e 0 -q 20,20 -g "forwardPrimer1xxx" -g "forwardPrimer2xxx" -g "forwardPrimer3xxx" -a "forwardPrimer1InverseSequencexxx" -a "forwardPrimer2InverseSequencexxx" -a "forwardPrimer3InverseSequencexxx" -G "reversePrimer1xxx" -G "reversePrimer2xxx" -G "reversePrimer3xxx" -A "reversePrimer1InverseSequencexxx" -A "reversePrimer2InverseSequencexxx" -A "reversePrimer3InverseSequencexxx" -o /path/to/write/output/${SAMPLE}_R1_001.fastq.gz -p /path/to/write/output/${SAMPLE}_R2_001.fastq.gz ${SAMPLE}_R1_001.fastq.gz ${SAMPLE}_R2_001.fastq.gz
done

The "xxx" avoids matching inner parts of the reads (http://cutadapt.readthedocs.io/en/stable/recipes.html#avoid-internal-adapter-matches).

Get the reverse complements of your orimers here http://reverse-complement.com/.

If you only need to remove one set of primers (one forward and one reverse), remove the extra -g, -G, -a, and -A from the script as required.

To see how the sed function works, go to http://www.grymoire.com/Unix/Sed.html.

ADD COMMENTlink modified 4 months ago • written 4 months ago by montoya.oscar40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1389 users visited in the last hour