Question: Trim read-specific adapter in paired-end reads
1
gravatar for Nicolas Rosewick
19 months ago by
Belgium, Brussels
Nicolas Rosewick7.7k wrote:

Hi,

I've a specific enriched DNA-seq library to analyze ( 2x76 bp sequenced on a NextSeq500).

The library is defined as :

R1                                                  R2
==============>-----------------<===========#####@@@@@

=== : DNA fragment (should correctly align to the genome)
### : barcode
@@@ : some random sequence we introduce to increase the library complexity

Important things to know :

  • barcode and the random sequence have always the same length (12 and 14 respectivelly)
  • Each pair of reads have different barcode (only PCR duplicates should have same barcode and read sequences)

My goal is to remove the barcode and the random sequence from R2 but also from R1 as R1 and R2 could overlap if the DNA fragment to sequence is small (less than 2x76 = 152 bp).

Example of R1 and R2 overlapping. In this case R1 contains sequence from the barcode

R1 =====================>
                    ||||
R2      <===========#####@@@@@

Is there some tool to handle such cases. My first idea would be to write some R script to extract the barcode and random sequence and to align them against R1 in a local manner..

trim adapter • 626 views
ADD COMMENTlink modified 19 months ago • written 19 months ago by Nicolas Rosewick7.7k

Not what you are asking for, but chances are that you don't actually have to remove this and can just align it, and it will get soft-clipped.

ADD REPLYlink written 19 months ago by WouterDeCoster39k

Yes I know but it would be nice to have clean reads for further analysis ;)

ADD REPLYlink written 19 months ago by Nicolas Rosewick7.7k

I think you can use cutadapt, if I'm not mistaken it'll remove the #### and following nts from R1

ADD REPLYlink written 19 months ago by Asaf5.7k

yes but in this case each read will have a different adapter to trim.

ADD REPLYlink written 19 months ago by Nicolas Rosewick7.7k

You can give only the #### sequence as an input to cutadapt and allow it to be anywhere along the sequence and request only the following sequence.

ADD REPLYlink written 19 months ago by Asaf5.7k

yes but each read will have a different #### sequence .

ADD REPLYlink written 19 months ago by Nicolas Rosewick7.7k

Oh, I skipped this part when first reading :). Good chances you'll end up coding it.

ADD REPLYlink written 19 months ago by Asaf5.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1377 users visited in the last hour