Seqkit Replace
1
0
Entering edit mode
6 weeks ago
pablo ▴ 300

Hi,

I would like to replace my FASTA headers according to a matching file.

I have :

head my.fasta
>CM020909.1:14117
aTTTTTGTCCCCAatattaggccctatgttctcacatttcacaatttttt[C/T]ccccaaaattaggccctatgttcccacatttcaagatttattttttccaa
>CM020909.1:148127
TTACTTTGTAGTAGCTTTTACTTTGTAGCTAGAGGCTTGGCTGTGCTGTT[T/A]GGGCTTTTACTGTAGTGGCTTGGGTTGGTAGTGACTTTGAAGGAGGTTAA
>CM020909.1:254785
CGTCCATCTTCTCCAGAGCTCTTTAAAGCCAAAGCGTTTTGGGGGACAGC[A/T]GCACAAAGAGCCAATCAAACGCCACAAAAGGCAGAGAACCGGACACCTGG
>CM020909.1:362180
ccaaaaatgatcGCCGCTGGtcgggagggggaggggacgggggAGGTGGG[T/G]AATTTGGCTTAAACACACAATTCAAAGAGGGGAACGTTGTTAAACAAACG
>CM020909.1:469928
agggtaactaatcattctttacccttctgaaaaagtgtaactacccttct[G/C]aaaaacagtaactaatcattaactacctatttttttgtgtaccttattat

And the corresponding file :

head correspondances.txt
CM020909.1      CHR1
CM020910.1      CHR2
CM020911.1      CHR3
CM020912.1      CHR4
CM020913.1      CHR5
CM020914.1      CHR6
CM020915.1      CHR7
CM020916.1      CHR8
CM020917.1      CHR9
CM020918.1      CHR10

I use this but does not work :

seqkit replace -k correspondances.txt -p "^(.+?) (.+)$" -r "{kv}:\$2" my.fasta

I need to keep the position after the " : " , to get something like that for the first sequence : >CHR1:14117

Any help?

seqkit fasta • 317 views
ADD COMMENT
2
Entering edit mode
6 weeks ago

"^(.+?) (.+)$" should be "^(.+?):(.+)$". It's a colon not space in the middle of CM020909.1:14117. :)

ADD COMMENT
0
Entering edit mode

Thanks a lot, that was pretty easy actually. I take this opportunity to ask you another thing : in my FASTA file, I have some headers which do not match any line of my correspondances.txt . With the seqkit replace command , that creates for these headers something like this ">:643" (rather than >CM020933.1:643) . Is there an option to do not edit those headers ?

ADD REPLY
1
Entering edit mode

add -K.

  -K, --keep-key                 keep the key as value when no value found for the key (only for
                                 sequence name)
  -U, --keep-untouch             do not change anything when no value found for the key (only for
                                 sequence name)

the logic is (code):

if v, ok = kvs[k]; ok {                             # replace with value
    r = reKV.ReplaceAll(r, []byte(v))
} else if keepUntouch {                             # -U, --keep-untouch
    doNotChange = true
} else if keepKey {                                 # -K, --keep-key
    r = reKV.ReplaceAll(r, found[keyCaptIdx])
} else {                                            # -m, --key-miss-repl
    r = reKV.ReplaceAll(r, []byte(keyMissRepl))
}
ADD REPLY
0
Entering edit mode

Perfect, thanks a lot!

ADD REPLY

Login before adding your answer.

Traffic: 1510 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6