Question: I want to make reference table so that I can annotate the antibiotic resistance gene hits with antibiotic resistance gene category name
0
gravatar for ghanbari.msc
2.3 years ago by
ghanbari.msc0 wrote:

Hi

I am too naive here, sorry first for trivial question. I want to make reference table so that I can annotate the antibiotic resistance gene hits with antibiotic resistance gene category name by using the following commands:

1)making a reference database for annotating the aro group numbers with the antibiotic resistance groups

cat ./aro.obo | tr "\n" "@" | sed 's/@@/\n/g' | grep -v format-version | grep -v Typedef | sed 's/\[Term\]@id\:\s//g' | sed 's/@.*@is_a/\tis_a/' | grep is_a | sed 's/@relationship.*//' | sed 's/is_a.*\!\s//' | sed 's/ /_/g' > ./ARO_numbers_and_AR_groups.tsv

2)Get a list of ARO numbers with their corresponding gene ID numbers and taxonomic associations from fasta 3)The fasta is annotated as a heirarchy so all ARO numbers should be taken

grep '>' AR-polypeptides.fa | sed 's/>//' | sed 's/ARO:1000001//g' |sed 's/\s.*ARO/\tARO/' | sed 's/\ .*\[/\t[/' | sed 's/ /_/g' > ./gene_IDs_and_ARO_numbers_and_AR_groups.tsv

4)Next, merge the files (using awk) into a single reference database

awk 'FNR==NR { a[$1]=$2; next } $2 in a { print a[$2]"\t"$1"\t"$2"\t"$3 }' ./ARO_numbers_and_AR_groups.tsv ./gene_IDs_and_ARO_numbers_and_AR_groups.tsv > ./CARD_annotation_reference.tsv

While I can produce the two outputs from the first and second command, the awk part does not give any output. here are some lines from the first and the second output.

./gene_IDs_and_ARO_numbers_and_AR_groups.tsv
ARO:0000000 antibiotic_molecule
ARO:0000001 antibiotic_molecule@synonym:_"quinolone"_EXACT_[]
ARO:0000002 tetracycline_resistance_gene
ARO:0000003 aminoglycoside@synonym:_"Astromicina"_EXACT_[]@synonym:_"Astromicine

./gene_IDs_and_ARO_numbers_and_AR_groups.tsv
gi|AAA76822.1|ARO:3002654|APH(3')-VIIa  [Campylobacter_jejuni]
gi|ABC26006.1|ARO:3001624|OXA-84    [Acinetobacter_baumannii]
gi|AAF86691.1|ARO:3001816|ACC-2 [Hafnia_alvei]
gi|AFU35065.1|ARO:3003206|lsaE  [Staphylococcus_aureus]
gi|AFM38048.1|ARO:3003206|lsaE  [Staphylococcus_aureus]

Could it be due to the different structure of the files i.e. TSV and | separated?

I appreciate if someone can help me to get it worked.

Regards Mahdi

sequencing next-gen • 856 views
ADD COMMENTlink modified 2.3 years ago by Biostar ♦♦ 20 • written 2.3 years ago by ghanbari.msc0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1651 users visited in the last hour