Entering edit mode
23 months ago
VenGeno
▴
100
Hi,
I am working with an Arabidopsis dataset generated using the Illumina TrueSeq lib prep and single read. kit and they are. When I run FASTQc it detects the following;
Then I ran the TrimGalore default setting and I get the following
This sample was multiplexed with TAACCG (indices), ATTGGC(indices primer). I have two questions;
- It shows an overrepresentation of Rubisco small subunits . What is the best way to handle this?
- Is it ok to custom trim ATAAAGTTTTGAGGTTTACACAAAAGCAAAGGGAAATTAACCGGTGAAGC sequence?
Thanks in advance.
Strangely none of the sequence sets I came across before had such rubisco overrepresentation. That's the reason why I was worried. Regarding the indices primer should I trim the whole sequence including the "GTGAAGC" that appears downstream? Usu. we get adapter trimmed clean sequences from the sequencing provider except for this time.
Rubisco has only ~24K reads so compared to full dataset so that is probably a small fraction of data?
Since this sample was prepared with a standard illumina kit, the index sequences are not going to be present in main reads. Index reads are sequenced separately and are only used during demultiplexing. At that time index sequences are transferred to fastq header of demultiplexed data.
Thank you GenoMax . As you suggested, I will proceed without trimming polyT, Rubisco, and index primer detected above.