Question: Extract specific reads from FASTQ files based on subsequence
1
gravatar for Paul
3.1 years ago by
Paul1.1k
European Union
Paul1.1k wrote:

Dear all,

I have FASTQ files and on start of my read I have 7 nucleotides tag - I would like to extract reads with this specific tag.

I would like to search in first 15 nucleotides of my reads, if match - extract this read to new fastq files.

Thank you for any ideas or help.

 

 

 

bash awk fastq • 4.2k views
ADD COMMENTlink modified 16 months ago by springpython0 • written 3.1 years ago by Paul1.1k

any other tools for sam question?

ADD REPLYlink modified 16 months ago • written 16 months ago by springpython0
1

Why? Are all of these tools inadequate?

ADD REPLYlink written 16 months ago by harold.smith.tarheel4.1k
1

There's a wonderful quote in Jurassic Park: "...were so preoccupied with whether they could, they didn't stop to think if they should."

ADD REPLYlink written 16 months ago by Ram15k
1

Nice :) Jurassic Park has a lot of apropos quotables for genetic research...

Personally, I like "Life will find a way"

ADD REPLYlink modified 16 months ago • written 16 months ago by Brian Bushnell15k
1

"Life ... uhh ... finds a way" :)

ADD REPLYlink written 16 months ago by Ram15k

My meme was more eloquent ;)

ADD REPLYlink modified 16 months ago • written 16 months ago by Brian Bushnell15k

We need an /r/biostars

ADD REPLYlink written 16 months ago by Ram15k

I am actually stuck in python code for same question. hence asked for suggestions.

ADD REPLYlink written 16 months ago by springpython0

Don't add answers unless you're answering the original question. Use Add Reply or Add Comment instead. Please read https://www.biostars.org/t/how-to/

I'm moving this to a comment on the top-level post now.

ADD REPLYlink written 16 months ago by Ram15k
3
gravatar for Ram
3.1 years ago by
Ram15k
New York
Ram15k wrote:

You can use Heng Li's bioawk:

To check if the tag is part of the first 15 bases

bioawk -c fastx 'substr($seq,0,15) ~ /$TAG/ { print }' reads.fq.gz

To match the first 7 bases to your tag,

bioawk -c fastx 'substr($seq,0,7) == $TAG { print }' reads.fq.gz
ADD COMMENTlink modified 16 months ago • written 3.1 years ago by Ram15k
2
gravatar for michael.ante
3.1 years ago by
michael.ante2.5k
Austria/Vienna
michael.ante2.5k wrote:

In order to stay old-school, you can use the FastX toolkit's barcode splitter.

You need a text file with your tag (let it be myTag.txt) and then you can run (with N mismatches):

cat sequence.fastq | /usr/local/bin/fastx_barcode_splitter.pl -Q33 --bcfile myTag.txt --bol --mismatches $N --prefix out --suffix .txt

In order to remove the tag-sequence, you can use the fastx_trimmer from the same toolkit

Using the Fastx-tools, do not forget to add the -Q33 option.

ADD COMMENTlink modified 16 months ago by Ram15k • written 3.1 years ago by michael.ante2.5k
2
gravatar for Brian Bushnell
3.1 years ago by
Walnut Creek, USA
Brian Bushnell15k wrote:

You can also do this with BBDuk:

bbduk.sh -Xmx1g in=reads.fq out=filtered.fq k=7 mm=f rcomp=f restrictleft=15 literal=ACGTACG

If you want multiple tags, you can list them separated by commas; and if you want to allow 1bp mismatch, set hdist=1.

ADD COMMENTlink modified 16 months ago by Ram15k • written 3.1 years ago by Brian Bushnell15k

"out" or "outm" ? https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/

ADD REPLYlink written 5 months ago by tsy199009290
3
gravatar for 5heikki
3.1 years ago by
5heikki7.3k
Finland
5heikki7.3k wrote:

Probably not the fastest option:

grep -B1 -A2 "^AGATCGG" file.fq | grep -v "^--$" > out.fq

Find lines that begin with "AGATCGG", grab one line before each hit and two lines after each hit, remove lines that are "--".

Note that it's possible (but extremely unlikely) that some quality value line begins like "AGATCGG", in which case the above command would mess up the output file. The likelihood of this is probably very close to zero but if you're processing a file with googolplex lines, maybe it could happen.

ADD COMMENTlink modified 16 months ago by Ram15k • written 3.1 years ago by 5heikki7.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1005 users visited in the last hour