I have a question about adapter identification.
I have the raw fastq.tar.gz files from an RNAseq experiment.
I am trying to replicate the pre-processing of a service provider as a learning experience. I have run the FastQC and see that the quality is good and generally all seems correct. The over-represented sequences does not contain any sequences. So all looks well. Now I am trying to trim off the adapters but alas I do not know which adapters were used so I though if I provided the illumina_adapters.fa and to cutadapt (Version 1.4.1) using the following command line:
cutadapt -b file:illumina_adapters.fa -m 15 -O 10 -e 0.1 Sample_file.fastq -o trimmed_Sample_file.fastq
I am using the same parameters as the service provider however when I run it this way I seem to trim approximately 3,000 reads but the service provider trims only 500 reads.
P5 - AATGATACGGCGACCACCGA Reverse compliment P7 - TCGTATGCCGTCTTCTGCTTG
Sequences I am able to pull what I think are adapter dimers. If this is the case am I correct in thinking I should be able to find the adapter sequence?
grep 'P5 Sequence' Sample_file.fastq GCTTCTGTAATTGAAAACCTAGAT-AATGATACGGCGACCACCGA-ACAAAGT GCTTCTGTAATTGAAAACCTAGAT-AATGATACGGCGACCACCGA-ACACAGT GCTTCTGTAATTGAAAACCTAGAT-AATGATACGGCGACCACCGA-ACACTGT GCTTCTGTAATTGAAAACCTAGAT-AATGATACGGCGACCACCGA-ACACTGT CTAAAGCTTCACACTTGATC-AATGATACGGCGACCACCGA-ACCCACTTTGC grep 'P7 Sequence' Sample_file.fastq CAAATGTATTTTAATAAGGTGATG-`TCGTATGCCGTCTTCTGCTTG-`AAAAAA CTAAAGCTTCACACTTGATCAGGGATC-TCGTATGCCGTCTTCTGCTTG-AAA CTAAAGCTTCACACTTGATCAGGTATC-TCGTATGCCGTCTTCTGCTTG-AAA CTAAAGCTTCACACTTGATCAGGTATC-TCGTATGCCGTCTTCTGCTTG-AAA CTAAAGCTTCACACTTGATCAGGTATC-TCGTATGCCGTCTTCTGCTTG-AAC GTCGATGAGAGCCCAGAAATGTGAGAAAA-TCGTATGCCGTCTTCTGCTTG-A
Which part would be the adapter in this case?
Thanks in advance