Question

Illumina adapter identification

2

Entering edit mode

9.4 years ago

Jonathan Crowther ▴ 210

Hi guys,

I have a question about adapter identification.

I have the raw fastq.tar.gz files from an RNAseq experiment.

I am trying to replicate the pre-processing of a service provider as a learning experience. I have run the FastQC and see that the quality is good and generally all seems correct. The over-represented sequences does not contain any sequences. So all looks well. Now I am trying to trim off the adapters but alas I do not know which adapters were used so I though if I provided the illumina_adapters.fa and to cutadapt (Version 1.4.1) using the following command line:

cutadapt -b file:illumina_adapters.fa -m 15 -O 10 -e 0.1 Sample_file.fastq -o trimmed_Sample_file.fastq

I am using the same parameters as the service provider however when I run it this way I seem to trim approximately 3,000 reads but the service provider trims only 500 reads.

Using the

P5 - AATGATACGGCGACCACCGA 
Reverse compliment P7 - TCGTATGCCGTCTTCTGCTTG

Sequences I am able to pull what I think are adapter dimers. If this is the case am I correct in thinking I should be able to find the adapter sequence?

grep 'P5 Sequence' Sample_file.fastq

GCTTCTGTAATTGAAAACCTAGAT-AATGATACGGCGACCACCGA-ACAAAGT
GCTTCTGTAATTGAAAACCTAGAT-AATGATACGGCGACCACCGA-ACACAGT
GCTTCTGTAATTGAAAACCTAGAT-AATGATACGGCGACCACCGA-ACACTGT
GCTTCTGTAATTGAAAACCTAGAT-AATGATACGGCGACCACCGA-ACACTGT
CTAAAGCTTCACACTTGATC-AATGATACGGCGACCACCGA-ACCCACTTTGC

grep 'P7 Sequence' Sample_file.fastq

CAAATGTATTTTAATAAGGTGATG-`TCGTATGCCGTCTTCTGCTTG-`AAAAAA
CTAAAGCTTCACACTTGATCAGGGATC-TCGTATGCCGTCTTCTGCTTG-AAA
CTAAAGCTTCACACTTGATCAGGTATC-TCGTATGCCGTCTTCTGCTTG-AAA
CTAAAGCTTCACACTTGATCAGGTATC-TCGTATGCCGTCTTCTGCTTG-AAA
CTAAAGCTTCACACTTGATCAGGTATC-TCGTATGCCGTCTTCTGCTTG-AAC
GTCGATGAGAGCCCAGAAATGTGAGAAAA-TCGTATGCCGTCTTCTGCTTG-A

Which part would be the adapter in this case?

Thanks in advance

adapters illumina RNA-Seq • 6.8k views

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by Jonathan Crowther ▴ 210

Ram · Answer 1 · 2014-12-04

I am assuming that you have used illumina platform.

In general, for RNA-SEQ, there is no need to remove adapter sequences. For small-RNA, as the sequence is around 22 nucleotides, the adapter gets sequenced along with the 22 nucleotide small-RNA.

but in case of RNA-SEQ, there is very less likey that the adapter gets sequences as the transcripts will be longer than 100 base pairs.When the input DNA fragment is less than the read length, then only adapter gets sequenced.

Anyway, it's good to remove adapters to be sure. So the service provider will have a list of adapters used for multiplexing your libraries. You need to contact them for list of adapters used for your libraries.

Edit: FastQC will inform list of ovverrepresented/adapter sequences.