I'm analysing data from a multiplexed RNA-seq experiment where the library was generated using the Ilumina TruSeq Universal Adapter and the Illumina TruSeq Index Adapters (see page 12 of this letter from Illumina).
From the FastQC report I can see a slight adapter contamination and I'd like to remove all adapters from the FASTQ files before progressing with the analysis. I'm using Scythe ( https://github.com/vsbuffalo/scythe ), which needs the adpater sequences in FASTA format. This is my FASTA file:
>TruSeq_Universal_Adapter AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT >TruSeq_Universal_Adapter_RC AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT >TruSeq_Index_Adapter GATCGGAAGAGCACACGTCTGAACTCCAGTCAC >TruSeq_Index_Adapter_RC CAAGCAGAAGACGGCATACGAGAT
My questions are:
- For the index adapter, I'm using the invariable part of the sequence at the 5'- end of the 6 bases-long barcode (TruSeq_Index_Adapter) and the reverse complement of the sequence at the 3'- end of the 6-bases-long barcode (TruSeq_Index_Adapter_RC). Is this correct?
- Do I need to include the adapters' reverse complements at all or this is already taken care of when specifying the forward sequence?