Split reads (fastq) x read barcode sequence (no barcode in RG)
1
0
Entering edit mode
22 months ago
emmanouil.a ▴ 110

Hi,

I have a target PCR basic custom library with a unique initial barcode (IonTorrent barcode). For example, I amplified two regions... the two regions start with a common sequence "linker" and then they have a "region barcode", like

AAAAAAAAnBBBBBBBnnnnnnnnnn (first_region:AAA...=linker, BBB...=region1_barcode,n=nucleotide)

AAAAAAAAnCCCCCCCnnnnnnnnnnn (second_region:AAA...= linker, CCC...=region2_barcode,n=nucleotide)

Is there any tool that can help me to split the reads in two fastq, by region barcode? I want to do this before to map with BWA. For multimapping reasons I can not map all reads in a fasta with the two conting or all reads in each single contig fasta.

In the ReadGroup is not reported any barcode and I do not want to use something like "grep" or similar because in my region_barcode (BBB... CCC...) I could have a mismatch due to PCR/sequencig error.

Many thanks

next-gen sequence • 596 views
ADD COMMENT
1
Entering edit mode
22 months ago
GenoMax 121k

You can use bbduk.sh from BBMap suite in filter only mode. You will need to set value of k to < length(BBBB)/2. A guide for BBduk is available.

Single-end data:

bbduk.sh in=your.fq literal=BBBBBBBB_sequence k=5 outm=interesting_seq.fq hdist=1

Paired-end data:

bbduk.sh in1=your_R1.fq in2=your_R2.fq literal=BBBBBBB_sequence k=5 outm=stdout.fq | reformat.sh in=stdin.fq out1=interesting_R1.fq out2=interesting_R2.fq hdist=1
ADD COMMENT
0
Entering edit mode

Many thanks, I will try!

ADD REPLY

Login before adding your answer.

Traffic: 1111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6