Bioinformatics Perl Help
1
2
Entering edit mode
8.1 years ago
ksi216 ▴ 80

Hi,

I'm having trouble with this task:

Write a perl script that will generate a new output file (“task1 output.txt”) which contains the sequence name, length, and GC-content for each sequence. There should be a header line which identifies the contents of columns (so the first line in the output file should be “SeqName Length GC-Content” or something similar). The GC-content of a sequence is defined as the percentage of bases that are G or C (from 0% to 100%), and a high GCcontent is associated with coding sequences.

>Seq1
ACGT

Then your output file should look like:

SeqName Length GC-Content
Seq1              4           50

I can do the in and out for the file handles, but I'm confused as to what to put in my while loop. And how will it know to match in the file?

Perl • 2.7k views
ADD COMMENT
0
Entering edit mode

You need to show the code so we can see the issues you are having. We can tell you what to do but that probably won't help much.

ADD REPLY
0
Entering edit mode
use strict;
use warnings;

open(IN,"/home/ki/Downloads/HW3_Sequences.txt") or die "Cant open file: $!";
open(OUT, ">/home/ki/Downloads/HW3final.txt");
while(<IN>) {{if($_ =~ m/>/){
    print OUT "$_"
    }
    {if($_ =~ tr/AGTC/AGTC/){
    print OUT 
}
}
}
}
ADD REPLY
0
Entering edit mode
  1. Why the double {{ for the while body?
  2. What is the purpose of the tr operation?
ADD REPLY
0
Entering edit mode

to count the length of each sequence ?

ADD REPLY
0
Entering edit mode

You're using a tr operation to count length? Why?

ADD REPLY
0
Entering edit mode

what do you suggest ?

ADD REPLY
1
Entering edit mode

The length() function sounds appropriate here.

ADD REPLY
0
Entering edit mode

thanks alot, all for your help i got it.

ADD REPLY
0
Entering edit mode

i thought the double {{ is necessary for multiple tasks within the loop ?

ADD REPLY
0
Entering edit mode

Who says so? That makes no sense in any programming language.

ADD REPLY
0
Entering edit mode

thank you Ill remove it, Im learning as I go here.

ADD REPLY
4
Entering edit mode
8.1 years ago
Ram 43k

Like mentioned, GC content is the percentage of bases that are G or C in the sequences. Percentage is calculated quite easily using basic math once you obtain the counts of bases that are G or C in and the total sequence length for each sequence. Your loop with iterate through the file, executing its code block for each sequence it finds.

I cannot - and I hope others do not too - provide code. That would cripple learning.

ADD COMMENT
0
Entering edit mode

Perhaps you should try preparing a script to do this task for a file with just 1 sequence.

After you prepare the correct output for this 1 sequence, expand the script to do this task in a loop for more sequences.

ADD REPLY
0
Entering edit mode
open(IN,"/home/ki/Downloads/HW3_Sequences.txt") or die "Cant open file: $!";
open(OUT, ">/home/ki/Downloads/HW3final.txt");
while(<in>){if( =~ m/>/)
    print OUT "$_n/"
    }

I'm confused by the while loop it only does it for one sequence, it won't do the rest in the file

ADD REPLY
2
Entering edit mode

one hint is that "IN" is not the same thing as "in".

ADD REPLY
0
Entering edit mode

open(IN,"/home/ki/Downloads/HW3_Sequences.txt") or die "Cant open file: $!"; open(OUT, ">/home/ki/Downloads/HW3final.txt"); while(<in>){if( =~ m/>/) print OUT "$_n/" }

im getting a syntax error from this

ADD REPLY
0
Entering edit mode

Please stop pasting the same code over and over again. Chris has answered on what the primary problem with that piece of code is.

ADD REPLY
0
Entering edit mode

I corrected it so that its <in>

ADD REPLY
0
Entering edit mode

sure, its just i need guidance with the while loop ive posted my code below

ADD REPLY

Login before adding your answer.

Traffic: 2116 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6