Question: Bioinformatics Perl Help
2
gravatar for ksi216
3.0 years ago by
ksi21650
ksi21650 wrote:

HI, im having trouble with this task Write a perl script that will generate a new output file (“task1 output.txt”) which contains the sequence name, length, and GC-content for each sequence. There should be a header line which identifies the contents of columns (so the first line in the output file should be “SeqName Length GC-Content” or something similar). The GC-content of a sequence is defined as the percentage of bases that are G or C (from 0% to 100%), and a high GCcontent is associated with coding sequences.

Seq1 ACGT Then your output file should look like: SeqName Length GC-Content Seq1 4 50

I can do the in and out for the file handles, but im confused as to what to put in my while loop. and how will it know to match in the file ?

bioinformatics perl • 1.1k views
ADD COMMENTlink modified 3.0 years ago by RamRS20k • written 3.0 years ago by ksi21650

You need to show the code so we can see the issues you are having. We can tell you what to do but that probably won't help much.

ADD REPLYlink written 3.0 years ago by SES8.1k
use strict;
use warnings;

open(IN,"/home/ki/Downloads/HW3_Sequences.txt") or die "Cant open file: $!";
open(OUT, ">/home/ki/Downloads/HW3final.txt");
while(<IN>) {{if($_ =~ m/>/){
    print OUT "$_"
    }
    {if($_ =~ tr/AGTC/AGTC/){
    print OUT 
}
}
}
}
ADD REPLYlink modified 3.0 years ago by RamRS20k • written 3.0 years ago by ksi21650
  1. Why the double {{ for the while body?
  2. What is the purpose of the tr operation?
ADD REPLYlink written 3.0 years ago by RamRS20k

to count the length of each sequence ?

ADD REPLYlink written 3.0 years ago by ksi21650

You're using a tr operation to count length? Why?

ADD REPLYlink written 3.0 years ago by RamRS20k

what do you suggest ?

ADD REPLYlink written 3.0 years ago by ksi21650
1

The length() function sounds appropriate here.

ADD REPLYlink written 3.0 years ago by RamRS20k

thanks alot, all for your help i got it.

ADD REPLYlink written 3.0 years ago by ksi21650

i thought the double {{ is necessary for multiple tasks within the loop ?

ADD REPLYlink written 3.0 years ago by ksi21650

Who says so? That makes no sense in any programming language.

ADD REPLYlink written 3.0 years ago by RamRS20k

thank you Ill remove it, Im learning as I go here.

ADD REPLYlink written 3.0 years ago by ksi21650
4
gravatar for RamRS
3.0 years ago by
RamRS20k
Houston, TX
RamRS20k wrote:

Like mentioned, GC content is the percentage of bases that are G or C in the sequences. Percentage is calculated quite easily using basic math once you obtain the counts of bases that are G or C in and the total sequence length for each sequence. Your loop with iterate through the file, executing its code block for each sequence it finds.

I cannot - and I hope others do not too - provide code. That would cripple learning.

ADD COMMENTlink written 3.0 years ago by RamRS20k

Perhaps you should try preparing a script to do this task for a file with just 1 sequence.

After you prepare the correct output for this 1 sequence, expand the script to do this task in a loop for more sequences.

ADD REPLYlink written 3.0 years ago by Cytosine440
open(IN,"/home/ki/Downloads/HW3_Sequences.txt") or die "Cant open file: $!";
open(OUT, ">/home/ki/Downloads/HW3final.txt");
while(<in>){if( =~ m/>/)
    print OUT "$_n/"
    }

I'm confused by the while loop it only does it for one sequence, it won't do the rest in the file

ADD REPLYlink modified 3.0 years ago by RamRS20k • written 3.0 years ago by ksi21650
2

one hint is that "IN" is not the same thing as "in".

ADD REPLYlink written 3.0 years ago by Chris Miller20k

open(IN,"/home/ki/Downloads/HW3_Sequences.txt") or die "Cant open file: $!"; open(OUT, ">/home/ki/Downloads/HW3final.txt"); while(<in>){if( =~ m/>/) print OUT "$_n/" }

im getting a syntax error from this

ADD REPLYlink written 3.0 years ago by ksi21650

Please stop pasting the same code over and over again. Chris has answered on what the primary problem with that piece of code is.

ADD REPLYlink written 3.0 years ago by RamRS20k

I corrected it so that its <in>

ADD REPLYlink written 3.0 years ago by ksi21650

sure, its just i need guidance with the while loop ive posted my code below

ADD REPLYlink written 3.0 years ago by ksi21650
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1223 users visited in the last hour