Question

conservative position extraction

0

Entering edit mode

4.0 years ago

gatiyatov • 0

I have kind of fastq files with multiple records:

>ID some information
--A-TGTGAC
0100111100 etc.

Where the 2nd line is a consensus sequence (gap or nucleotide), and 3rd is (now binary) a conservative. How to parse this file and extract position with the "1" score? Pure Python code is too complicated. Biopython works with only Phred score.

sequence consensus conservative fastq parsing • 632 views

ADD COMMENT • link updated 4.0 years ago by JC 13k • written 4.0 years ago by gatiyatov • 0

score 0 · Answer 1 · 2020-05-05

Perl (because Python seems complicated):

#!/usr/bin/perl
use strict;
use warnings;

my $nl = 0;
my @sq = '';
while (<>) {
    $nl++;
    if ($nl == 1) {
       print;
    }
    elsif ($nl == 2) {
        chomp;
        @sq = split(//, $_);
    elsif ($nl == 3) {
        chomp;
        my @cn = split(//, $_);
        if ($#sq == $#cn) {
            for (my $i=0; $i<=$#cn; $i++) {
                 print $sq[$i] if ($cn[$i] == 1);
             }
            print "\n";
        }
        else { die "line2 and line3 have different lenght\n"; }
        $nl = 0;
    }
}

run as:

perl getCons.pl < FASTA_IN > FASTA_OUT