Question: How can I extract the sequencing reads containing a specific linker/tag?
1
gravatar for naeem40thju
4 months ago by
naeem40thju0 wrote:

I have two FASTQ data sets/files (read.R1.fastq and read.R2.fastq) generated from paired-end read sequencing. One file (read.R1.fastq) contains 18 nucleotide long linker/tag. How can I extract the reads containing the linker (allowing 3/4 mutations inside it) from read.R1.fastq and its corresponding reads from read.R2.fastq and save the extracted reads into two separate files? Is it possible to prepare a single file after extraction which will contain the full-length sequence of reads and their information (such as ID, quality score, etc.)?

Thanks in advance.

sequencing next-gen • 211 views
ADD COMMENTlink modified 4 months ago by genomax91k • written 4 months ago by naeem40thju0

How can I extract the reads containing the linker (allowing 3/4 mutations inside it) from read.R1.fastq and its corresponding reads from read.R2.fastq and save the extracted reads into two separate files

Use cutadapt for this. You can use max error rate or write a regex with known positions of variation.

Is it possible to prepare a single file after extraction which will contain the full-length sequence of reads and their information (such as ID, quality score, etc.)?

If you are looking for merging reads from R1 and R2 retaining quality scores etc, try pandaseq. If you are looking for interleaving, try bbmap

ADD REPLYlink modified 4 months ago • written 4 months ago by cpad011214k
1
gravatar for Ido Tamir
4 months ago by
Ido Tamir5.1k
Austria
Ido Tamir5.1k wrote:

you can use cutadapt for this serching for the linker allowing the appropriate number of mutations action = none and saving the reads with/without linker into separate files

ADD COMMENTlink written 4 months ago by Ido Tamir5.1k
1
gravatar for genomax
4 months ago by
genomax91k
United States
genomax91k wrote:

Actually three programs from BBTools can do all of these operations. bbduk.sh with literal=real_linker_seq hdist=4 (up to 4 errors) to look for the linker tag. You can keep or filter the reads out. bbmerge.sh to merge the reads (if they actually are designed merge). reformat.sh to interleave the reads (if that is desired). And much more.

ADD COMMENTlink modified 4 months ago • written 4 months ago by genomax91k

Thank you very much!

ADD REPLYlink written 4 months ago by naeem40thju0

please upvote useful answers and accept one as the best, not thank in a comment

ADD REPLYlink modified 4 months ago • written 4 months ago by Ido Tamir5.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1172 users visited in the last hour