Question: (Closed) How To Extracting Fastq Sequence For Given Fastq Ids And Fastq File
0
gravatar for Raghav
6.4 years ago by
Raghav100
Allahabad, India
Raghav100 wrote:

I have a text file which contain fastq ids and another file is my original fastq file which is approx 14GB. Is there any efficient program which could help me to extract fastq seq according to my IDs are like:

@lcl|SRR681003.3 SN603:5:1101:47.10:122.20 length=100

@lcl|SRR681003.14 SN603:5:1101:57.10:114.60 length=100

@lcl|SRR681003.16 SN603:5:1101:72.70:115.10 length=100

@lcl|SRR681003.19 SN603:5:1101:54.80:117.50 length=100

@lcl|SRR681003.22 SN603:5:1101:50.60:119.00 length=100

it is very easy to extract fasta seq for given id by using fastacmd but no idea how to extract fastq files of desire ids

fastq • 12k views
ADD COMMENTlink modified 6.4 years ago by brentp23k • written 6.4 years ago by Raghav100
6

Dear raghvendra, this question has been asked and answered already on this site: How to efficiently parse a huge fastq file?, How to extract a subset of reads in fastq using an ID list?, How to extract set of reads from fastq (or eventually fasta and qual) based on list of ids?, or extracting a subset of sequences from a FASTQ file (BioPython speed). You should be able to find the information you are looking for by searching for "fastq extract" easily.

As an experienced user, you are recommended to study the site search. I will close this question in order to avoid further duplication and confusion. I hope you understand my motivation for this descision. Please feel free to post a comment if you do not agree with it or want to refine your question.

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Michael Dondrup46k

Dear Sir, I understand and realized, I must avoid such type of duplicate questions which definitely create confusion for new users, thank sir I will keep it in mind further,

ADD REPLYlink written 6.4 years ago by Raghav100

Don't worry, this happened to almost everyone I believe.

ADD REPLYlink written 6.4 years ago by Michael Dondrup46k
1

do you need a system to extract those ids often (like fasttacmd , and you need an index) or do you just need to grep the file ?

ADD REPLYlink written 6.4 years ago by Pierre Lindenbaum124k

Dear Sir, I want to grep it. by using awk or grep shell command

ADD REPLYlink written 6.4 years ago by Raghav100

There is one solution using grep as well: A: How to efficiently parse a huge fastq file? , hope it works.

ADD REPLYlink written 6.4 years ago by Michael Dondrup46k

Sir, for single read it dooing perfectly well grep -e "@SRR681003.7 SN603:5:1101:70.90:105.60 length=100" -A 3 che.fastq > output.fastq but when I am giving it as whole id.txt like grep -f id.txt -A 3 che.fastq > out.fastq it dumped entire file as it is, I am missing matching part i think please help me out sir thnQ

ADD REPLYlink written 6.4 years ago by Raghav100

Please post this comment to the original answer. I am sure, you will get some coe in response, if not send me a PM over the site.

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Michael Dondrup46k
5
gravatar for brentp
6.4 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

lh3's seqtk can do this. See here: https://github.com/lh3/seqtk

with the example:

Extract sequences with names in file name.lst, one sequence name per line:

seqtk subseq in.fq name.lst > out.fq
ADD COMMENTlink written 6.4 years ago by brentp23k

Dear Sir, it is good tool :) thnQ

ADD REPLYlink written 6.4 years ago by Raghav100

great information, thank you. Although this works is simple, but with great tools will simplify further! It is worth to look around here.

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by pengchy410
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1142 users visited in the last hour