conservative position extraction
1
0
Entering edit mode
2.0 years ago
gatiyatov • 0

I have kind of fastq files with multiple records:

  1. >ID some information
  2. --A-TGTGAC
  3. 0100111100 etc.

Where the 2nd line is a consensus sequence (gap or nucleotide), and 3rd is (now binary) a conservative. How to parse this file and extract position with the "1" score? Pure Python code is too complicated. Biopython works with only Phred score.

sequence consensus conservative fastq parsing • 392 views
ADD COMMENT
0
Entering edit mode
2.0 years ago
JC 13k

Perl (because Python seems complicated):

#!/usr/bin/perl
use strict;
use warnings;

my $nl = 0;
my @sq = '';
while (<>) {
    $nl++;
    if ($nl == 1) {
       print;
    }
    elsif ($nl == 2) {
        chomp;
        @sq = split(//, $_);
    elsif ($nl == 3) {
        chomp;
        my @cn = split(//, $_);
        if ($#sq == $#cn) {
            for (my $i=0; $i<=$#cn; $i++) {
                 print $sq[$i] if ($cn[$i] == 1);
             }
            print "\n";
        }
        else { die "line2 and line3 have different lenght\n"; }
        $nl = 0;
    }
}

run as:

perl getCons.pl < FASTA_IN > FASTA_OUT

ADD COMMENT

Login before adding your answer.

Traffic: 1462 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6