Question: How To Extract Set Of Reads From Fastq (Or Eventually Fasta And Qual) Based On List Of Ids?
8
gravatar for Biomonika (Noolean)
7.9 years ago by
State College, PA, USA
Biomonika (Noolean)3.1k wrote:

I have a list of ids and two files (male and female). I want to find sequences and quality scores for all ids in my list.

I have:

list of ids

2 fastq files (eventually 2 fasta files and 2 qual files)

and I want to get:

extracted_sequences_based_on_ids.fastq (eventually extracted_sequences_based_on_ids.fasta and extracted_sequences_based_on_ids.qual) (can be two files that I merge afterwards)

Yes, I can write the script that will be for each id looking up the sequence and quality score and writing the result to a new file, but I would not like to reinvent the wheel in case this is something easy to perform with already existing tools:) (and might be handy for other biostars users as well)

Thanks a lot.

fasta read extraction 454 fastq • 21k views
ADD COMMENTlink written 7.9 years ago by Biomonika (Noolean)3.1k

Duplicated post: http://www.biostars.org/post/show/10353/how-to-efficiently-parse-a-huge-fastq-file/

ADD REPLYlink written 7.9 years ago by Leszek4.0k
11
gravatar for Ole Kristian Tørresen
7.9 years ago by
Oslo
Ole Kristian Tørresen130 wrote:

I think Heng Li's seqtk will do what you need: https://github.com/lh3/seqtk

Extract sequences with names in file name.lst, one sequence name per line:

seqtk subseq in.fq name.lst > out.fq
ADD COMMENTlink written 7.9 years ago by Ole Kristian Tørresen130

Thank you very much! It worked.

ADD REPLYlink written 7.9 years ago by Biomonika (Noolean)3.1k

Hi, I try to extract sequences from a fastq file using the "subseq" in seqtk. But the extract file contains only the 1st sequence but no others. I am wondering whether my name.lst file does not fit with what seqtk needs. I have names of each sequence without other symbols each line in the name.lst. But the fastq file starts each sequence name with a @. Should I add @ in front of each sequence name? Or what other problem it can be?

Any suggestion is welcome. Thanks,

Chih-Ming

ADD REPLYlink written 7.2 years ago by ymwur10
4
gravatar for Martin A Hansen
7.9 years ago by
Martin A Hansen3.0k
Denmark
Martin A Hansen3.0k wrote:

This can be done with Biopieces and is in fact covered in the Howto.

ADD COMMENTlink written 7.9 years ago by Martin A Hansen3.0k

Thanks a lot, I didnt know Biopieces (looks very interesting though), but I unfortunately wasnt able to install it because of some problem with Perl modul Inline.

ADD REPLYlink modified 7.9 years ago • written 7.9 years ago by Biomonika (Noolean)3.1k

Try the Biopieces Google Group for help.

ADD REPLYlink written 7.2 years ago by Martin A Hansen3.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2328 users visited in the last hour