Perl Stript:Break Contigs Into Overlapping Sequences
2
0
Entering edit mode
9.1 years ago
biolab ★ 1.4k

Dear All,

I am a perl beginner. I have a fasta file with many contig sequences, and need to break these contigs into 2kb overlapping fragments (with overlap length of 100bp). Could anyone help to write a perl script for me, when you have spare time? I will greatly appreciate your help. MANY THANKS!

Biolab

perl bioinformatics • 4.0k views
2
Entering edit mode

What have you written so far? You're not likely to get many people willing to just randomly write scripts for you.

0
Entering edit mode

I am learning perl and am a true starter. Could you give some possible solutions to solving my question? THANKS A LOT! Biolab

0
Entering edit mode

Well, how about I just give you a possible work-flow. You'll probably want to use bioperl to make life easier.

for contig in file
contig_length = length of contig
start_position = 0 #I assume bioperl is 0-based
while(start_position < contig_length) {
stop_position = start_position+1999
if(stop_position >= contig_length) stop_position = contig_length-1
sequence = extract subsequence given start/stop_position
write sequence to file
start_position += 1900
}
}

0
Entering edit mode

Thank you very much!!

0
Entering edit mode

No problem. Keep in mind that that general workflow might not produce exactly what you want at the end of a contig. You might want to just break out of a loop if the length of the subsequence is very short, since otherwise the last subsection could be contained entirely within the next to last subsection.

0
Entering edit mode

This question have been asked before because I remember answering it. Search!

3
Entering edit mode
8.9 years ago
brentp 24k

not perl, but see pyfasta: https://pypi.python.org/pypi/pyfasta/#command-line-interface you can use it from the command-line as:

$pyfasta split -n 1 -k 1000 -o 200 original.fasta  ADD COMMENT 0 Entering edit mode this one is really interesting. thanks! ADD REPLY 0 Entering edit mode 7.7 years ago biolab ★ 1.4k #!/usr/bin/perl use strict; use warnings;$/ = ">";
open my $fasta_file, '<',$ARGV[0] or die $!; my$omitted = <$fasta_file>; while (<$fasta_file>) {
s/\r//;
chomp;
my ($id,$seq) = /(.+?)\n(.+)/s or next;
$seq =~ s/\n//g; break_into_segments($id, $seq); } close$fasta_file;

sub break_into_segments {
my ($id,$seq) = @_;
my $seq_len = length$seq;
my $i; while ($seq_len > 2000) {
$i++; my$extr_seq = substr($seq, 0, 2000);$seq = substr($seq,1900);$seq_len -= 1900;
print ">$id\_$i\n$extr_seq\n"; }$i++;
print ">$id\_$i\n\$seq\n";
}