Question

How to find shared uniprotIDs across 7 metagenomic assemblies?

0

Entering edit mode

5.7 years ago

manpy64 ▴ 10

dear all, iam new to bioinformatics and iam stuck in my project, I have Prokka results from 7 different metagenomic assemblies of different varieties of Azolla fern. i have collected list of uniprotIDs from every bin of 7 different metagenomic assemblies. Now i want to find out which uniprotIDs are shared across these different metagenomic assemblies?. because the number of uniprotIDs in every bins of these assemblies is in several thousands and i cannot manually check it. And second question is how can i use these uniprotIDs in R programme or PCA. ANY help or guidance will be highly appreciated. thanks manpy

R Assembly gene • 1.1k views

ADD COMMENT • link updated 5.5 years ago by Biostar 20 • written 5.7 years ago by manpy64 ▴ 10

1

Entering edit mode

What format is the result data in? Columnar or just text collection of uniprot ID's in file(s)?

ADD REPLY • link 5.7 years ago by GenoMax 141k

0

Entering edit mode

It's is in column a list of all uniprot IDs I used grep to only view uniprot id from gff files of my prokka results

ADD REPLY • link 5.7 years ago by manpy64 ▴ 10

1

Entering edit mode

Are these in separate files? You can sort and uniq the files to get non-redundant lists. Then next step would be to use join (or comm) to identify shared ID's. Give that a try and post if you run into problems.

ADD REPLY • link 5.7 years ago by GenoMax 141k

0

Entering edit mode

They are in separate files for each bin of each metagenomic assembly. Ok I will try . Thank you so much sir.

ADD REPLY • link 5.7 years ago by manpy64 ▴ 10

0

Entering edit mode

Iam using cat --l 'UniProt.*' | sort -d | uniq --count/path/to/list ofuniprot/ still iam getting in several thousands its only numbering them, the number of times they are repeated in the genome in each bins i using this command for every bin in different metagenomic assemblies because each bins corresponds to a specific organism (bacteria) so thats why iam not combining all UniProt ids in each metagenomic assembly

  1 Q9RGS8
  1 P0AGB6
  1 Q5SJ80
  1 P45857
  1 O07610

my question is how to make these tables presentable for my report and how can i use them for PcA

ADD REPLY • link updated 5.7 years ago by GenoMax 141k • written 5.7 years ago by manpy64 ▴ 10