<reporter name="GT_Hg_chr17_69994042-69994101_138264" systematic_name="chr17:69994042-69994101" active_sequence="AGCCACCCAACAGAAGCAAAAGACAACTAAGGCAGCAAATACAAGCCTACAATATATCCA" start_coord="0">
<feature number="12921">
<position x="1.3931461995545655" y="0.0" units="mm"/></feature>
<gene systematic_name="chr17:69994042-69994101" primary_name="chr17:69994042-69994101" description="Unknown"></gene>
</reporter>
Hi all i have file as above, of size 20mb with the same data.
if($_=~/\<reporter name\=\"(.*)\"\ssystematic_name\=\"(.*)\"\sactive_sequence\=\"(.*)\"\s.*/g){
$ID=$1;
$Loc=$2;
$Seq=$3;
print OUT"$ID\t $Loc\t $Seq\n";
}
the problem is i could match the line only till "active_sequence" coz till there it is in a single line, but all i want to do is read the file from
<reporter> to </reporter>
as a single line match and so on,,,,,
so how do i this using PERL,,,,??
Rule of thumb. Never parse HTML or XML with REGEX. Use appropriate modules.
Because i want to match and print values of "x","y" and description with "t"