Question

Perl Stript:Break Contigs Into Overlapping Sequences

0

Entering edit mode

11.7 years ago

biolab ★ 1.4k

Dear All,

I am a perl beginner. I have a fasta file with many contig sequences, and need to break these contigs into 2kb overlapping fragments (with overlap length of 100bp). Could anyone help to write a perl script for me, when you have spare time? I will greatly appreciate your help. MANY THANKS!

Biolab

perl • 5.2k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 11.7 years ago by biolab ★ 1.4k

2

Entering edit mode

What have you written so far? You're not likely to get many people willing to just randomly write scripts for you.

ADD REPLY • link 11.7 years ago by Devon Ryan 105k

0

Entering edit mode

I am learning perl and am a true starter. Could you give some possible solutions to solving my question? THANKS A LOT! Biolab

ADD REPLY • link 11.7 years ago by biolab ★ 1.4k

0

Entering edit mode

Well, how about I just give you a possible work-flow. You'll probably want to use bioperl to make life easier.

for contig in file
    contig_length = length of contig
    start_position = 0 #I assume bioperl is 0-based
    while(start_position < contig_length) {
        stop_position = start_position+1999
        if(stop_position >= contig_length) stop_position = contig_length-1
        sequence = extract subsequence given start/stop_position
        write sequence to file
        start_position += 1900
    }
}

ADD REPLY • link 11.7 years ago by Devon Ryan 105k

0

Entering edit mode

Thank you very much!!

ADD REPLY • link 11.7 years ago by biolab ★ 1.4k

0

Entering edit mode

No problem. Keep in mind that that general workflow might not produce exactly what you want at the end of a contig. You might want to just break out of a loop if the length of the subsequence is very short, since otherwise the last subsection could be contained entirely within the next to last subsection.

ADD REPLY • link 11.7 years ago by Devon Ryan 105k

0

Entering edit mode

This question have been asked before because I remember answering it. Search!

ADD REPLY • link 11.7 years ago by Martin A Hansen 3.0k

score 3 · Answer 1 · 2014-01-03

3

Entering edit mode

11.5 years ago

brentp 24k

not perl, but see pyfasta: https://pypi.python.org/pypi/pyfasta/#command-line-interface you can use it from the command-line as:

$ pyfasta split -n 1 -k 1000 -o 200 original.fasta

ADD COMMENT • link 11.5 years ago by brentp 24k

0

Entering edit mode

this one is really interesting. thanks!

ADD REPLY • link 11.5 years ago by Pavel Senin ★ 1.9k

score 0 · Answer 2 · 2015-04-07

#!/usr/bin/perl
use strict;
use warnings;

$/ = ">";
open my $fasta_file, '<', $ARGV[0] or die $!;
my $omitted = <$fasta_file>;
while (<$fasta_file>) {
    s/\r//;
    chomp;
    my ($id, $seq) = /(.+?)\n(.+)/s or next;
    $seq =~ s/\n//g;
    break_into_segments($id, $seq);
}
close $fasta_file;

sub break_into_segments {
    my ($id, $seq) = @_;
    my $seq_len = length $seq;
    my $i;
    while ($seq_len > 2000) {
        $i++;
        my $extr_seq = substr($seq, 0, 2000);
        $seq = substr($seq,1900);
        $seq_len -= 1900;
        print ">$id\_$i\n$extr_seq\n";
    }
    $i++;
    print ">$id\_$i\n$seq\n";
}