Is There Any Tool For Converting Fasta To Reads Of Certain Length?
1
0
Entering edit mode
11.0 years ago

Hi there!

Is there any tool for converting fasta to reads of certain length? Possibly with simulation of random fragmenting and unequal coverage (=based on random choice). The best would be if reads kept identification of sequence they came from like so that

>my_header
ACTGTTTACGTGGT

would lead into:

>my_header_1
TGTT
>my_header_2
GTGG
>my_header_3
TTTA

Thanks a lot :-)

fasta reads • 2.2k views
ADD COMMENT
1
Entering edit mode

duplicate of many questions related to samtools/wgsim ... wgsim

ADD REPLY
0
Entering edit mode

Well I did not know wgsim and wasn't successful with my google search of "fasta to reads", "sequence to reads convert" and other combinations.. thanks.

ADD REPLY
1
Entering edit mode

Search for "reads simulator". There are plenty of programs like that (hard to recommend a single one), simulating not only lengths, but also errors (depending on the sequencing platform you want to emulate). Some also allow for specifying coverage model.

ADD REPLY
1
Entering edit mode
11.0 years ago

You can try this bit of code that I wrote and modify it to suit your needs:

#!/usr/bin/python

"""Digest sequences from a fasta file in sequences of a specified length.

Usage:
    %program <input_file> <fragment_length> <move> <output_file>

input_file: a fasta file
fragment_length: length of digested fragments
move: number of non-overlapping nucleotides between two consecutive fragments"""


import sys

try:
    from Bio import SeqIO
except:
    print "This program requires the Biopython library"
    sys.exit(0)

try:
    in_file = open(sys.argv[1], "rU")
    nb_nuc = int(sys.argv[2])
    move = int(sys.argv[3])
    out_file = sys.argv[4]
except:
    print __doc__
    sys.exit(0)

sequences = ([seq.id, seq.seq.tostring()] 
             for seq in SeqIO.parse(in_file, 'fasta'))

def digest(seq, nb_nuc):
    fragments = []
    n = seq[0]
    n_counter = 0
    s = seq[1]
    start = nb_nuc/2
    while len(s) > nb_nuc/2:
        frag = s[:nb_nuc - start]
        start -= move
        if start < 0:
            start = 0
            s = s[move:]
        n_counter +=1
        fragments.append([n + "_" + str("%06i" % n_counter), frag])
    return fragments

with open(out_file, "w") as out_file:
    for seq in sequences:
        fragments = digest(seq, nb_nuc)
        for frag in fragments:
            out_file.write(">" + frag[0] + "\n" + frag[1] + "\n")
ADD COMMENT

Login before adding your answer.

Traffic: 1415 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6