Question: paired end reads cutadapt
0
gravatar for anna
5 months ago by
anna10
anna10 wrote:

I have several paired end reads files. After performing the FastQC analysis i found out that some pairs have one or more overrepresented sequences. Should i trimm these sequences from both of the two files (1_1 and 1_2)? or just in the only one that have them overrepresented (1_2)? This is an example:

  1. File 1_1.fq: No Overrepresented sequences
  2. File 1_2.fq AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

Then I'm considering the command line as follows:

cutadapt -b AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTT \
         -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT \
         -B AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTT \
         -B TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT \
         -o out_1.fastq \
         -p out_2.fq \
         1_1.fastq 1_2.fq
ADD COMMENTlink modified 5 months ago by Kevin Blighe63k • written 5 months ago by anna10

Please use the formatting bar (especially the code option) to present your post better. It is difficult to see what you are presenting.
code_formatting

Thank you!

ADD REPLYlink written 5 months ago by genomax87k

I formatted OP's code this time. I think we should add splitting up long one-liners to the (upcoming) formatting manual.

ADD REPLYlink written 5 months ago by RamRS28k

How many reads are affected (%)?

ADD REPLYlink written 5 months ago by ATpoint36k
1
gravatar for Kevin Blighe
5 months ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

There have been a few threads on this topic already:

In conclusion, I would just remove the standard adapters that are known to CutAdapt from the sequences, and also filter / trim reads based on length and quality, and then proceed to alignment. My feeling is that the main thing that is affected by trimming an filtering reads is the quality metrics like percent alignment. Most 'junk' reads, including poly A and T, will not align anyway.

Kevin

ADD COMMENTlink written 5 months ago by Kevin Blighe63k
1

Thanks for your answer. I will consider removing the overrepresented sequences and compare the "clean" data against the raw data, which already have a very good quality of the reads.

ADD REPLYlink written 5 months ago by anna10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 971 users visited in the last hour