Question

How can I extract the sequencing reads containing a specific linker/tag?

1

Entering edit mode

3.9 years ago

naeem40thju ▴ 10

I have two FASTQ data sets/files (read.R1.fastq and read.R2.fastq) generated from paired-end read sequencing. One file (read.R1.fastq) contains 18 nucleotide long linker/tag. How can I extract the reads containing the linker (allowing 3/4 mutations inside it) from read.R1.fastq and its corresponding reads from read.R2.fastq and save the extracted reads into two separate files? Is it possible to prepare a single file after extraction which will contain the full-length sequence of reads and their information (such as ID, quality score, etc.)?

Thanks in advance.

sequencing next-gen • 1.3k views

ADD COMMENT • link updated 3.9 years ago by GenoMax 141k • written 3.9 years ago by naeem40thju ▴ 10

0

Entering edit mode

How can I extract the reads containing the linker (allowing 3/4 mutations inside it) from read.R1.fastq and its corresponding reads from read.R2.fastq and save the extracted reads into two separate files

Use cutadapt for this. You can use max error rate or write a regex with known positions of variation.

Is it possible to prepare a single file after extraction which will contain the full-length sequence of reads and their information (such as ID, quality score, etc.)?

If you are looking for merging reads from R1 and R2 retaining quality scores etc, try pandaseq. If you are looking for interleaving, try bbmap

ADD REPLY • link 3.9 years ago by cpad0112 21k

score 1 · Answer 1 · 2020-06-04

1

Entering edit mode

3.9 years ago

Ido Tamir 5.2k

you can use cutadapt for this serching for the linker allowing the appropriate number of mutations action = none and saving the reads with/without linker into separate files

ADD COMMENT • link 3.9 years ago by Ido Tamir 5.2k

score 1 · Answer 2 · 2020-06-04

1

Entering edit mode

3.9 years ago

GenoMax 141k

Actually three programs from BBTools can do all of these operations. bbduk.sh with literal=real_linker_seq hdist=4 (up to 4 errors) to look for the linker tag. You can keep or filter the reads out. bbmerge.sh to merge the reads (if they actually are designed merge). reformat.sh to interleave the reads (if that is desired). And much more.