Question: Data management in R, perl or python
0
gravatar for Shaminur
4 months ago by
Shaminur150
Dhaka University
Shaminur150 wrote:

Hi, thank you for answering my problem.

My data is in format as below:

A1P
D4M
N6G
A1F
D4S
N6L
A1C

I want the output or output should be :

A1P/F/C
D4M/S
N6G/L

Is any R package code available? perl or python code will also be great. Thank you very much. Sorry for wasting your valuable time.

python R perl • 242 views
ADD COMMENTlink modified 4 months ago by mohammadhassanj130 • written 4 months ago by Shaminur150
1

Please add more details to your question. What does "/F/C" and "\L" stand for?

ADD REPLYlink written 4 months ago by Arup Ghosh2.8k
1

they want the suffix after the 2 letters collected and appended. This is a basic programming question for which they should show at least some effort in some language

ADD REPLYlink written 4 months ago by Ido Tamir5.1k

English Alphabet! Basically, it will be used for amino acid (single letter code) and Nucleotide data formating.

ADD REPLYlink written 4 months ago by Shaminur150
1

I'd look out for residue positions >9 - that will result in total length being >3, and scripts below that don't account for it will fail. JC's solution will work best in that case.

ADD REPLYlink written 4 months ago by Ram32k
1

yeah, I was thinking the OP could have a position >9 in the inputs

ADD REPLYlink written 4 months ago by JC12k
4
gravatar for JC
4 months ago by
JC12k
Mexico
JC12k wrote:

Perl:

#!/usr/bin/perl
use strict;
use warnings;
my %data;
while (<>) {
    chomp;
    if (m/(\w\d+)(\w)/) {
        my $key = $1;
        my $new = $2;
        if (defined $data{$key}) {
            $data{$key} .= "/$new";
        }
        else {
            $data{$key} = $new;
        }
    }
}
while (my ($key, $aa) = each %data) {
    print "$key$aa\n";
}

Test:

$ perl comb.pl < list.txt
A1P/F/C
D4M/S
N6G/L
ADD COMMENTlink written 4 months ago by JC12k

Thank you, JC. Excellent!

ADD REPLYlink written 4 months ago by Shaminur150
2
gravatar for mohammadhassanj
4 months ago by
mohammadhassanj130 wrote:

python solution

from collections import defaultdict
result = defaultdict(str)
for line in open("input.txt").readlines():
  line = line.strip()
  result[line[:2]] = "/".join([result[line[:2]],line[-1]])
with open("output.txt","a") as file:
  for first,second in result.items():
    file.write(first+second[1:]+"\n")
ADD COMMENTlink written 4 months ago by mohammadhassanj130

Thank you for sharing your scripts.

ADD REPLYlink written 4 months ago by Shaminur150

By the way, you don't need to bookmark every answer. You can bookmark the top level post, and that way you'll have access to all the answers.

ADD REPLYlink written 4 months ago by Ram32k

Sorry for that. I will follow your instruction. Thank you very much for the correction.

ADD REPLYlink written 4 months ago by Shaminur150

Don't worry about it - it's not a "Don't do this", it's just "you don't need to". Our bookmarks section can get cluttered easily.

ADD REPLYlink written 4 months ago by Ram32k
2
gravatar for Pierre Lindenbaum
4 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum134k wrote:
sed 's/^\(..\)/\1\t/' input.txt | datamash  -t $'\t' -s -g 1  collapse 2 
A1  P,F,C
D4  M,S
N6  G,L
ADD COMMENTlink written 4 months ago by Pierre Lindenbaum134k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2417 users visited in the last hour
_