Bioinformatics Perl Help
1
2
Entering edit mode
5.1 years ago
ksi216 ▴ 90

HI, im having trouble with this task Write a perl script that will generate a new output file (“task1 output.txt”) which contains the sequence name, length, and GC-content for each sequence. There should be a header line which identifies the contents of columns (so the first line in the output file should be “SeqName Length GC-Content” or something similar). The GC-content of a sequence is defined as the percentage of bases that are G or C (from 0% to 100%), and a high GCcontent is associated with coding sequences.

Seq1 ACGT Then your output file should look like: SeqName Length GC-Content Seq1 4 50

I can do the in and out for the file handles, but im confused as to what to put in my while loop. and how will it know to match in the file ?

Perl Bioinformatics • 1.4k views
0
Entering edit mode

You need to show the code so we can see the issues you are having. We can tell you what to do but that probably won't help much.

0
Entering edit mode
use strict;
use warnings;

open(IN,"/home/ki/Downloads/HW3_Sequences.txt") or die "Cant open file: $!"; open(OUT, ">/home/ki/Downloads/HW3final.txt"); while(<IN>) {{if($_ =~ m/>/){
print OUT "$_" } {if($_ =~ tr/AGTC/AGTC/){
print OUT
}
}
}
}

0
Entering edit mode
1. Why the double {{ for the while body?
2. What is the purpose of the tr operation?
0
Entering edit mode

to count the length of each sequence ?

0
Entering edit mode

You're using a tr operation to count length? Why?

0
Entering edit mode

what do you suggest ?

1
Entering edit mode

The length() function sounds appropriate here.

0
Entering edit mode

thanks alot, all for your help i got it.

0
Entering edit mode

i thought the double {{ is necessary for multiple tasks within the loop ?

0
Entering edit mode

Who says so? That makes no sense in any programming language.

0
Entering edit mode

thank you Ill remove it, Im learning as I go here.

4
Entering edit mode
5.1 years ago
Ram 32k

Like mentioned, GC content is the percentage of bases that are G or C in the sequences. Percentage is calculated quite easily using basic math once you obtain the counts of bases that are G or C in and the total sequence length for each sequence. Your loop with iterate through the file, executing its code block for each sequence it finds.

I cannot - and I hope others do not too - provide code. That would cripple learning.

0
Entering edit mode

Perhaps you should try preparing a script to do this task for a file with just 1 sequence.

After you prepare the correct output for this 1 sequence, expand the script to do this task in a loop for more sequences.

0
Entering edit mode
open(IN,"/home/ki/Downloads/HW3_Sequences.txt") or die "Cant open file: $!"; open(OUT, ">/home/ki/Downloads/HW3final.txt"); while(<in>){if( =~ m/>/) print OUT "$_n/"
}


I'm confused by the while loop it only does it for one sequence, it won't do the rest in the file

2
Entering edit mode

one hint is that "IN" is not the same thing as "in".

0
Entering edit mode

open(IN,"/home/ki/Downloads/HW3_Sequences.txt") or die "Cant open file: $!"; open(OUT, ">/home/ki/Downloads/HW3final.txt"); while(<in>){if( =~ m/>/) print OUT "$_n/" }

im getting a syntax error from this

0
Entering edit mode

Please stop pasting the same code over and over again. Chris has answered on what the primary problem with that piece of code is.

0
Entering edit mode

I corrected it so that its <in>

0
Entering edit mode

sure, its just i need guidance with the while loop ive posted my code below