Question: How can I add an incremented value just after Accession no. ?
0
gravatar for genious.kalia.141198
11 weeks ago by
genious.kalia.1411980 wrote:

I have a large csv file (1.7GB) containing sequences, and i have to provide a header to each sequence, so i did some thing like this with bash to do same:

*#/bin/bash

cat con_test.csv > out.out

for file in out.out; do

sed -e 's/^/>NZ_CP00000.1 volvox complete genome\n/' -i "$file" done*

my input files:

AAAAAAAATGTGCTCCGGCCTCCGCGAAATTCGCGACGCCGGCCGCGTGGGCATGCACGTC

GGCCGTTACCTGGAGCCAGCGGGACTCGAAGGATGCCCCACGATGAGTTCAGCAGCAATGA

CCAAGCCTGCGCGTGCCCTGCGTGGTTCTTCCCCACAGCAGCACACCGTGAGGGCAAACTG

TCGCCGCACGTTCGGGCAAAAAAACCTGACGTGCGCGGTCTTGTAAAGCGGTTAGTCACCGA

AGGGCACGCGGGGCCGATTCGCACCGGCCGAGGTCTGCCCAAGGCAACCCCTAGAGTCTAG

my output file after running this script. (NZ_CP00000.1)

NZ_CP00000.1 volvox complete genome AAAAAAAATGTGCTCCGGCCTCCGCGAAATTCGCGACGCCGGCCGCGTGGGCATGCACGTC

NZ_CP00000.1 volvox complete genome GGCCGTTACCTGGAGCCAGCGGGACTCGAAGGATGCCCCACGATGAGTTCAGCAGCAATGA

NZ_CP00000.1 volvox complete genome CCAAGCCTGCGCGTGCCCTGCGTGGTTCTTCCCCACAGCAGCACACCGTGAGGGCAAACTG

NZ_CP00000.1 volvox complete genome TCGCCGCACGTTCGGGCAAAAAAACCTGACGTGCGCGGTCTTGTAAAGCGGTTAGTCACCGA

NZ_CP00000.1 volvox complete genome AGGGCACGCGGGGCCGATTCGCACCGGCCGAGGTCTGCCCAAGGCAACCCCTAGAGTCTAG

Now i want to assign a different or unique value with the accession no. to all my sequences, so that the description line looks something like this: ( NZ_CP00000.1_000000001) and the unique value incremented for every time

>NZ_CP00000.1_000000001 volvox complete genome AAAAAAAATGTGCTCCGGCCTCCGCGAAATTCGCGACGCCGGCCGCGTGGGCATGCACGTC

>NZ_CP00000.1_000000002 volvox complete genome GGCCGTTACCTGGAGCCAGCGGGACTCGAAGGATGCCCCACGATGAGTTCAGCAGCAATGA

>NZ_CP00000.1_000000003 volvox complete genome CCAAGCCTGCGCGTGCCCTGCGTGGTTCTTCCCCACAGCAGCACACCGTGAGGGCAAACTG

>NZ_CP00000.1_000000004 volvox complete genome TCGCCGCACGTTCGGGCAAAAAAACCTGACGTGCGCGGTCTTGTAAAGCGGTTAGTCACCGA

>NZ_CP00000.1_000000005 volvox complete genome AGGGCACGCGGGGCCGATTCGCACCGGCCGAGGTCTGCCCAAGGCAACCCCTAGAGTCTAG

how can i achieve this?

alignment assembly genome • 120 views
ADD COMMENTlink modified 11 weeks ago by cpad01129.0k • written 11 weeks ago by genious.kalia.1411980
2
gravatar for Pierre Lindenbaum
11 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum112k wrote:
awk '/^>/ {$1=sprintf("%s_%010d",$1,++N);} {print;}' input.fa
ADD COMMENTlink written 11 weeks ago by Pierre Lindenbaum112k

thank you so much it works..

ADD REPLYlink written 11 weeks ago by genious.kalia.1411980

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLYlink written 11 weeks ago by Pierre Lindenbaum112k
1
gravatar for shenwei356
11 weeks ago by
shenwei3564.1k
China
shenwei3564.1k wrote:
$ seqkit replace -p '^(.+?) (.+)' -r '${1}_{nr} $2' --nr-width 9 -w 0 seq.fa
>NZ_CP00000.1_000000001 volvox complete genome
AAAAAAAATGTGCTCCGGCCTCCGCGAAATTCGCGACGCCGGCCGCGTGGGCATGCACGTC
>NZ_CP00000.1_000000002 volvox complete genome
GGCCGTTACCTGGAGCCAGCGGGACTCGAAGGATGCCCCACGATGAGTTCAGCAGCAATGA
>NZ_CP00000.1_000000003 volvox complete genome
CCAAGCCTGCGCGTGCCCTGCGTGGTTCTTCCCCACAGCAGCACACCGTGAGGGCAAACTG
>NZ_CP00000.1_000000004 volvox complete genome
TCGCCGCACGTTCGGGCAAAAAAACCTGACGTGCGCGGTCTTGTAAAGCGGTTAGTCACCGA
>NZ_CP00000.1_000000005 volvox complete genome
AGGGCACGCGGGGCCGATTCGCACCGGCCGAGGTCTGCCCAAGGCAACCCCTAGAGTCTA
ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by shenwei3564.1k

i think seqkit is not installed thats why generating error. command not found.

ADD REPLYlink written 11 weeks ago by genious.kalia.1411980

oh, you can google it

ADD REPLYlink written 11 weeks ago by shenwei3564.1k
0
gravatar for cpad0112
11 weeks ago by
cpad01129.0k
India
cpad01129.0k wrote:
nl -nrz   -bp">"   test.fa | sed "/>/ s/^\([0-9]\+\).*\(>\w\+\.[0-9]\)\(.*\)/\2_\1\3/g;s/^\s\+//g" 

>NZ_CP00000.1_000001 volvox complete genome
AAAAAAAATGTGCTCCGGCCTCCGCGAAATTCGCGACGCCGGCCGCGTGGGCATGCACGTC
>NZ_CP00000.1_000002 volvox complete genome
GGCCGTTACCTGGAGCCAGCGGGACTCGAAGGATGCCCCACGATGAGTTCAGCAGCAATGA
>NZ_CP00000.1_000003 volvox complete genome
CCAAGCCTGCGCGTGCCCTGCGTGGTTCTTCCCCACAGCAGCACACCGTGAGGGCAAACTG
>NZ_CP00000.1_000004 volvox complete genome
TCGCCGCACGTTCGGGCAAAAAAACCTGACGTGCGCGGTCTTGTAAAGCGGTTAGTCACCGA
>NZ_CP00000.1_000005 volvox complete genome
AGGGCACGCGGGGCCGATTCGCACCGGCCGAGGTCTGCCCAAGGCAACCCCTAGAGTCTAG
ADD COMMENTlink written 11 weeks ago by cpad01129.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 703 users visited in the last hour