Question: Trimming Overrepresented Sequences in paired-end RNA-Seq data, as underlined by FastQC.
0
gravatar for Shaurya Jauhari
4 months ago by
India
Shaurya Jauhari40 wrote:

I am analyzing a RNA-Seq paired end sequence data. I have used cutadapt before to trim overrepresented sequences as derived from a fastqc report. However, this time around there is a slight twist in application.

One of the reads I4_R1.fastq has the following attribute,

>>Overrepresented sequences    warn
#Sequence    Count    Percentage    Possible Source
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATG    36771 0.14170337546219017    TruSeq Adapter, Index 6 (100% over 49bp)
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGC    36534 0.14079005518304247    TruSeq Adapter, Index 6 (100% over 50bp)
>>END_MODULE

while the other, I4_R2.fastq has the following:

>>Overrepresented sequences    warn
#Sequence    Count    Percentage    Possible Source
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCC    37938 0.14620061076077806    Illumina Single End PCR Primer 1 (100% over 50bp)
GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCG    35122 0.1353486702287956    Illumina Single End PCR Primer 1 (100% over 50bp)
>>END_MODULE

It's hard to figure inputs for "-a" and "-A" as adorned by cutadapt. The forward-end and reverse-end seem indecisive here. On top of everything, can they even use different adapters for the same paired end sequences? Is there any rudimentary flaw with the library preparation that is being highlighted here?

Thanks in advance.

fastqc rna-seq paired-end • 446 views
ADD COMMENTlink modified 4 months ago by genomax39k • written 4 months ago by Shaurya Jauhari40
2

Use bbduk.sh from BBMap suite. BBMap contains a list of all commonly used adapter/primer sequences in resources/adapters.fa in BBMap software bundle. You can refer to this file to scan for all contaminants at the same time.

ADD REPLYlink modified 4 months ago • written 4 months ago by genomax39k

Thank you for your reply. Contrarily, is it also usual to have no overrepresented sequences at all.

ADD REPLYlink written 4 months ago by Shaurya Jauhari40

Depends on the dataset/experiment.

ADD REPLYlink written 4 months ago by genomax39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1018 users visited in the last hour