Question

Data management in R, perl or python

0

Entering edit mode

4.1 years ago

MSRS ▴ 590

Hi, thank you for answering my problem.

My data is in format as below:

A1P
D4M
N6G
A1F
D4S
N6L
A1C

I want the output or output should be :

A1P/F/C
D4M/S
N6G/L

Is any R package code available? perl or python code will also be great. Thank you very much. Sorry for wasting your valuable time.

R perl python • 1.5k views

ADD COMMENT • link updated 4.1 years ago by mohammadhassanj ▴ 260 • written 4.1 years ago by MSRS ▴ 590

1

Entering edit mode

Please add more details to your question. What does "/F/C" and "\L" stand for?

ADD REPLY • link 4.1 years ago by Arup Ghosh 3.2k

1

Entering edit mode

they want the suffix after the 2 letters collected and appended. This is a basic programming question for which they should show at least some effort in some language

ADD REPLY • link 4.1 years ago by Ido Tamir 5.2k

0

Entering edit mode

English Alphabet! Basically, it will be used for amino acid (single letter code) and Nucleotide data formating.

ADD REPLY • link 4.1 years ago by MSRS ▴ 590

1

Entering edit mode

I'd look out for residue positions >9 - that will result in total length being >3, and scripts below that don't account for it will fail. JC's solution will work best in that case.

ADD REPLY • link 4.1 years ago by Ram 44k

1

Entering edit mode

yeah, I was thinking the OP could have a position >9 in the inputs

ADD REPLY • link 4.1 years ago by JC 13k

2

Entering edit mode

4.1 years ago

Pierre Lindenbaum 164k

sed 's/^\(..\)/\1\t/' input.txt | datamash  -t $'\t' -s -g 1  collapse 2 
A1  P,F,C
D4  M,S
N6  G,L

ADD COMMENT • link 4.1 years ago by Pierre Lindenbaum 164k

score 4 · Accepted Answer · 2020-10-12

4

Entering edit mode

4.1 years ago

JC 13k

Perl:

#!/usr/bin/perl
use strict;
use warnings;
my %data;
while (<>) {
    chomp;
    if (m/(\w\d+)(\w)/) {
        my $key = $1;
        my $new = $2;
        if (defined $data{$key}) {
            $data{$key} .= "/$new";
        }
        else {
            $data{$key} = $new;
        }
    }
}
while (my ($key, $aa) = each %data) {
    print "$key$aa\n";
}

Test:

$ perl comb.pl < list.txt
A1P/F/C
D4M/S
N6G/L

ADD COMMENT • link 4.1 years ago by JC 13k

0

Entering edit mode

Thank you, JC. Excellent!

ADD REPLY • link 4.1 years ago by MSRS ▴ 590

score 2 · Accepted Answer · 2020-10-12

2

Entering edit mode

4.1 years ago

mohammadhassanj ▴ 260

python solution

from collections import defaultdict
result = defaultdict(str)
for line in open("input.txt").readlines():
  line = line.strip()
  result[line[:2]] = "/".join([result[line[:2]],line[-1]])
with open("output.txt","a") as file:
  for first,second in result.items():
    file.write(first+second[1:]+"\n")

ADD COMMENT • link 4.1 years ago by mohammadhassanj ▴ 260

0

Entering edit mode

Thank you for sharing your scripts.

ADD REPLY • link 4.1 years ago by MSRS ▴ 590

0

Entering edit mode

By the way, you don't need to bookmark every answer. You can bookmark the top level post, and that way you'll have access to all the answers.

ADD REPLY • link 4.1 years ago by Ram 44k

0

Entering edit mode

Sorry for that. I will follow your instruction. Thank you very much for the correction.

ADD REPLY • link 4.1 years ago by MSRS ▴ 590

0

Entering edit mode

Don't worry about it - it's not a "Don't do this", it's just "you don't need to". Our bookmarks section can get cluttered easily.

ADD REPLY • link 4.1 years ago by Ram 44k