Question

cutadapt loop and paired-end reads

1

Entering edit mode

6.9 years ago

sbrown669 ▴ 20

My aim is to create a bash loop over my fastq files that are in a directory like so: https://ibb.co/cMvx7a whereby, paired read files are organised in the form:

1

2

3

etc..

I don't know a way of dealing with these paired files within the cutadapt loop Something along these lines is what I'm looking for:

for file in folder

do
    cutadapt -a ADAPTER_FWD -A ADAPTER_REV -o out.1.fastq -p out.2.fastq reads.1.fastq reads.2.fastq
done

but I want to do pairs together from the folder and I don't know how.

Thanks in advance.

cutadapt RNA-Seq bash paired-end loop • 11k views

ADD COMMENT • link updated 6.2 years ago by montoya.oscar ▴ 50 • written 6.9 years ago by sbrown669 ▴ 20

1

Entering edit mode

6.2 years ago

montoya.oscar ▴ 50

Based on h.mod's hint, the following code removes three pairs of primers (three forward and thre reverse) and their reverse complements (simply the same primers sequences but backwards):

for i in *_R1_001.fastq.gz
do
  SAMPLE=$(echo ${i} | sed "s/_R1_\001\.fastq\.gz//") 
  echo ${SAMPLE}_R1_001.fastq.gz ${SAMPLE}_R2_001.fastq.gz
cutadapt -m 10 -O 17 -e 0 -q 20,20 -g "forwardPrimer1xxx" -g "forwardPrimer2xxx" -g "forwardPrimer3xxx" -a "forwardPrimer1InverseSequencexxx" -a "forwardPrimer2InverseSequencexxx" -a "forwardPrimer3InverseSequencexxx" -G "reversePrimer1xxx" -G "reversePrimer2xxx" -G "reversePrimer3xxx" -A "reversePrimer1InverseSequencexxx" -A "reversePrimer2InverseSequencexxx" -A "reversePrimer3InverseSequencexxx" -o /path/to/write/output/${SAMPLE}_R1_001.fastq.gz -p /path/to/write/output/${SAMPLE}_R2_001.fastq.gz ${SAMPLE}_R1_001.fastq.gz ${SAMPLE}_R2_001.fastq.gz
done

The "xxx" avoids matching inner parts of the reads (http://cutadapt.readthedocs.io/en/stable/recipes.html#avoid-internal-adapter-matches).

Get the reverse complements of your orimers here http://reverse-complement.com/.

If you only need to remove one set of primers (one forward and one reverse), remove the extra -g, -G, -a, and -A from the script as required.

To see how the sed function works, go to http://www.grymoire.com/Unix/Sed.html.

ADD COMMENT • link 6.2 years ago by montoya.oscar ▴ 50

h.mon · Accepted Answer · 2017-05-22

4

Entering edit mode

6.9 years ago

h.mon 35k

This is not a bioinformatics question, it is more a shell scripting question.

This draft should help you to get started:

for i in *_R1.fastq.gz
do
  SAMPLE=$(echo ${i} | sed "s/_R1\.fastq\.gz//")
  echo ${SAMPLE}_R1.fastq.gz ${SAMPLE}_R2.fastq.gz
done

ADD COMMENT • link 6.9 years ago by h.mon 35k

0

Entering edit mode

Hi all. I am trying to generate a loop to use Cutadapt on multiple paired-end read files from Mi-Seq.

I am using the below script that has been sourced and modified from previous posts. I am using the following script below. I keep running into the error: IOError: [Errno 2] No such file or directory:

It is calling the file names, but once it initiates it comes back with IOError:[Errno 2] No such file or directory:

Any idea why this is happening??

for i in /path/to/file/*_R1_001.fastq;
do
  SAMPLE=$(echo ${i} | sed "s/_R1_\001\.fastq//") 
  echo ${SAMPLE}_R1_001.fastq ${SAMPLE}_R2_001.fastq
  cutadapt -a FWDPrimer...RCREVPrimer -A REVPrimer...RCFWPrimer \
    --discard-untrimmed -o Path/to/file/${SAMPLE}_R1_001.fastq \
    -p Path/to/file/${SAMPLE}_R2_001.fastq 
  ${SAMPLE}_R1_001.fastq 
  ${SAMPLE}_R2_001.fastq 

done

ADD REPLY • link updated 5.8 years ago by h.mon 35k • written 5.8 years ago by brijohn • 0

1

Entering edit mode

@ brijohn In addition, to h.mon points above, you also need to both fastqs in cutadapt command line. Currently, they are supplied as independent arguments to shell. Loop will fail with permission denied error. Try this and let me know:

for i in /path/to/file/*_R1_001.fastq;
do
  SAMPLE=$(echo ${i} | sed "s/_R1_\001\.fastq//") 
  echo ${SAMPLE}_R1_001.fastq ${SAMPLE}_R2_001.fastq
  cutadapt -a FWDPrimer...RCREVPrimer -A REVPrimer...RCFWPrimer \
    --discard-untrimmed -o  ${SAMPLE}_R1_001.fastq \
    -p ${SAMPLE}_R2_001.fastq  ${SAMPLE}_R1_001.fastq  ${SAMPLE}_R2_001.fastq 

done

ADD REPLY • link 5.8 years ago by cpad0112 21k

0

Entering edit mode

Path/to/file/${SAMPLE}_R1_001.fastq translates into Path/to/file//path/to/file/someFile_R1_001.fastq

The echo is there just for debugging purposes, it has no functional value.

ADD REPLY • link 5.8 years ago by h.mon 35k