adding a sequence to a genome
2
1
Entering edit mode
3.7 years ago
Assa Yeroslaviz ★ 1.8k

I would like to know if there are tools or methods for adding a specific sequence into a fastA file at a specific position.

I would like to modify the genome, by adding a specific sequence in a specific location on the genome.

thanks

Assa

fasta insert • 1.6k views
ADD COMMENT
2
Entering edit mode

Not tested myself but seqkit mutate https://bioinf.shenwei.me/seqkit/usage/#mutate seems to do that.

ADD REPLY
1
Entering edit mode

Yes it supports this and you can choose which chromosomes/sequences to insert.

ADD REPLY
3
Entering edit mode
3.7 years ago

I did this script for one of my project some years ago. It allows you to look at bases at specific positions but also to modify your genome using another fasta file for the changes.

python lookmod_genome.py -m modify -f file.fasta -c changes_file.fasta

The changes_file.fasta should contains all changes you want to make using 4 key words (insertion, deletion, add and remove) and should looks like this :

>insertion:chr1:25:26
GCTAGCTAGC
>deletion:chr4:40:50
>add:your_chromosome_name
GTCGATCGTCATGGTT
>remove:your_chromosome_name

I used it quite a lot but it's self made so check the result with the look mode

Use Python 2 or modify the script to be Python 3 resilient

ADD COMMENT
0
Entering edit mode

Thanks for the script. It seems not to work for me though.

I have tried all three options for lookmod.py. see below. Only the look options works. I used a test fastA file as input.

$ python lookmod_genome.py -m look -p GL988041:10:20 -g test -r test/test.fa

-----------------------------------------
Mode : look
Fasta file : test/test.fa
Position : GL988041:10:20
Surroundings length : 10
-----------------------------------------

----------------------
------   LOOK   ------
----------------------
-> GAAAAAAAAA**GCCGTGCCGT


$ python lookmod_genome.py -m modify -g test -r test/test.fa -c test/insertion.fa 

-----------------------------------------
Mode : modify
Fasta file : test/test.fa
Construction file : test/insertion.fa
Output file rename: test.fa
-----------------------------------------

Traceback (most recent call last):
  File "lookmod_genome.py", line 499, in <module>
    main(sys.argv[1:])
  File "lookmod_genome.py", line 474, in main
    for key, value in added_dict.iteritems():
AttributeError: 'dict' object has no attribute 'iteritems'


$ python lookmod_genome.py -m modify -g test -r test/test.fa -c test/insertion.fa -i test/output.txt

-----------------------------------------
Mode : modify
Fasta file : test/test.fa
Construction file : test/insertion.fa
Output file rename: test.fa
Output information File : test/output.txt
-----------------------------------------

Traceback (most recent call last):
  File "lookmod_genome.py", line 499, in <module>
    main(sys.argv[1:])
  File "lookmod_genome.py", line 474, in main
    for key, value in added_dict.iteritems():
AttributeError: 'dict' object has no attribute 'iteritems'

$ head test/test.fa 
>GL988041 dna:supercontig supercontig:CTHT_3.0:GL988041:1:6909506:1 REF
GAAAAAAAAAAAAAAAAGAGCCGTGCCGTAGCCCAGTTTTGAACTCTGAAGCCAGATCAG
ACGCGGGATGCAGGAGACCTGGGTGCGGGAGGTGCGGCAGCTGGCCCAAACGGTGCTGCA
GACCTTTTTGCATTGAAGCCCATTTTCACATCCTCTTTTCGTTCTTCCTCCGTCTCCTTC

Do the strings in the chromosome name need to have a certain structure? are spaces or symbols allowed in the names? I can't figure out what dict it looks for.

ADD REPLY
0
Entering edit mode

Can you head the test/insertion.fa file please

ADD REPLY
0
Entering edit mode

I think I have solved it. I was using python which called python3, when I explicitly call it with python2 I can use the other two options as well.

in python3 the dict.iteritems() was changed to dict.items().

If possible maybe you can update it to fit python3?

thanks

ADD REPLY

Login before adding your answer.

Traffic: 2643 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6