Question: cutadapt and paired-end reads when you don't know your adapter sequences...
0
gravatar for beegrackle
2.6 years ago by
beegrackle90
United States
beegrackle90 wrote:

Hi all, I have some amplicon sequencing data from a company that has been endlessly frustrating to work with...it is a long story, but they basically dumped my sequencing data on a server without any information about what the adapter sequences were, etc. and then skedaddled.

So I went and looked at the sequences and found what I thought were the adapter sequences by hand/eyeball, and just checked that those adapter sequences I'd found were indeed in (almost) every read after the primer reverse complement at the 3' end of the purported amplified sequence. I did this separately for R1 and R2 because after the first parts of the adapter sequences (which are identical, and also match illumina adapter sequences I found online) the adapter sequences appear to diverge.

Because many of my actual amplified reads were so short, the reads had the adapter sequences and then poly-A tails and a ton of noise at the 3' end of my reads, so I didn't want to merge my paired-end reads until I'd trimmed them.

I used cutadapt to trim my sequence, and no matter what I do - use only the identical beginning part of the adapter for both, use the full 'adapters' (different for R1 and R2; I put this in quotations because it's just me eyeballing what I think it is), use 50% or 75% of the full 'adapters', even use the reverse complements of the primers - I get uneven results for R1 and R2. As in, I have a different distribution of sequence lengths for R1 and R2 after trimming. I would expect some small differences but in all of my fastq files, the R2 sequences are longer after trimming.

Has anyone dealt with this before/can suggest a better strategy? I'm thinking I might just lop off the ends of my sequences to get rid of some of the noise, merge the paired ends and then try trimming....

sequencing next-gen • 1.8k views
ADD COMMENTlink modified 2.6 years ago by YaGalbi1.4k • written 2.6 years ago by beegrackle90
2
gravatar for h.mon
2.6 years ago by
h.mon28k
Brazil
h.mon28k wrote:

You can use bbduk (from bbtools) with the flags tbo (trim adapters based on where paired reads overlap) and tpe (when kmer right-trimming, trim both reads to the minimum length of either).

ADD COMMENTlink written 2.6 years ago by h.mon28k

Thanks - that worked out really well!

ADD REPLYlink written 2.6 years ago by beegrackle90

Stay with BBTools and use bbmap.sh to align your data. You will be pleased with the results :)

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by genomax74k
1
gravatar for YaGalbi
2.6 years ago by
YaGalbi1.4k
Biocomputing, MRC Harwell Institute, Oxford, UK
YaGalbi1.4k wrote:

Don't fastqc + multiqc both return overrepresented sequences? That will tell you the adapter sequence exactly.

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by YaGalbi1.4k

Unfortunately - no. Instead all my overrepresented sequences are my forward primer (or reverse primer) and a following 10-bp sequence. Which is rather dodgy, I admit.

ADD REPLYlink written 2.6 years ago by beegrackle90

I wouldnt be suprised if the company didnt provide a qc report with the data they gave you... money back...not good enough at all.

ADD REPLYlink written 2.6 years ago by YaGalbi1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1683 users visited in the last hour