Illumina FASTQ FILTERING
2
0
Entering edit mode
9 weeks ago

Hello peers,

I need some help regarding QC of NGS Data.

I have some raw NGS data (More than 100 samples) in FASTQ format and I have trimmed adapter sequences using trimmomatic v0.39. The adapter sequences used for trimming were:

>PrefixPE/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCT

>PrefixPE/2 
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT

When I ran fastqc after that to check the adapter sequence were cleaned. But when I try to check the presence of index sequences, these index sequences were still in the reads.

For example, I have given below the description line of one read of the forward read file (Read1) from one sample:

@F00740:29:HCHTCDRXX:1:1101:10574:1016 1:N:0:TCCGGAGA+GGCTCTGA

As described in the fast file description line, TCCGGAGA+GGCTCTGA dual index (8bp) was used. These index sequences are still in the trimmed reads. I checked it using:

grep "^[^@]" 2B_TruSeqCD_BHCHTCDRXX_R1.fastq| grep GGCTCTGA

Output:

Searching GGCTCTGA Index

AND

grep "^[^@]" 2B_TruSeqCD_BHCHTCDRXX_R1.fastq| grep TCCGGAGA

Output:

Getting TCCGGAGA index

As these index sequences are not biological origin, how to get ride of it? or we can still go for alignment? Shall I trim these sequences off?

Your expert advice (with or without supporting articles/documentation) will be appreciated.

Thank you!

NGS Genomics • 335 views
ADD COMMENT
1
Entering edit mode
9 weeks ago
ATpoint 69k

The index sequence is a separate fastq file which is often not even distributed by the sequencing provider as you don’t need it. What you grep is just part of a normal read, a combination of 8 nucleotides can happen by chance in the genome. You’re fine, proceed with alignment.

ADD COMMENT
0
Entering edit mode
9 weeks ago
GenoMax 125k

Index reads are read independently of the main read in Illumina sequencing. Read order is Read 1 --> Index 1 (if present) --> Index 2 (if present) --> Read 2. The index reads are used during deultiplexing to bin main reads (R1 and R2) into sample specific files. In this process the index sequences are transferred to the header of the binned reads (they can be recovered as separate files since they may be needed in that form for rare protocols).

You do not need to do anything/worry about index sequences other than using them for sample identification.

ADD COMMENT

Login before adding your answer.

Traffic: 2444 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6