Question: How To Use Extract Record In Fastq Format With Random Number Of Sequence ?
0
gravatar for tri
7.7 years ago by
tri0
tri0 wrote:

hi

i try to write a small program to extract records from fassing tq format (appro. 14 million records ) and create new small fastq format file (1.4 million records). first, i create list of random numbers, then i try to scan through original data file with counter count, whenever count is in list of random number, this record will be put in target file. the problem is how to write this program using expression generator (since memory could not load all output result before write into file)

from Bio import SeqIO
count =0
rd =[56,12,5,6,3]  <-  using function to generate this list

def inc() :
  global count
  count +=1

input_seq_iterator = SeqIO.parse(open(("C:\\Python\\Doc\\ls_orchid.fastq-sanger", "rU"), "fastq-sanger")
short_seq_iterator = (record for record in input_seq_iterator 
                      if count in rd and inc() )

output_handle = open(C:\\Python26\\Doc\\selected.fasta", "w")
SeqIO.write(short_seq_iterator, output_handle, "fasta")
output_handle.close()

this program could not generate at all

thanks

expression • 1.7k views
ADD COMMENTlink modified 7.7 years ago by Damian Kao15k • written 7.7 years ago by tri0
0
gravatar for Damian Kao
7.7 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

You can do this:

from Bio import SeqIO

count = 0
for record in SeqIO.parse(open("C:\\Python\\Doc\\ls_orchid.fastq-sanger", "rU"), "fastq-sanger"):
  if count in rd:
    print ">" + record.id
    print str(record.seq)
  count += 1

It might be faster to use a dictionary for the random numbers instead of an array:

from Bio import SeqIO
rd = dict([(x,True) for x in rd])

count = 0
for record in SeqIO.parse(open("C:\\Python\\Doc\\ls_orchid.fastq-sanger", "rU"), "fastq-sanger"):
  if rd.has_key(count):
    print ">" + record.id
    print str(record.seq)
  count += 1
ADD COMMENTlink written 7.7 years ago by Damian Kao15k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1084 users visited in the last hour