Removing specified range of bases from middle of the contigs and creating new sequences
0
0
Entering edit mode
7 weeks ago

Hi I am trying to exclude a span of nucleotides from inbetween my contigs. I have the contig IDs and the start:end position as given below. I would like to remove all the bases between start & end.

Example;

> ID length start..end

>Contig1 100    20..35

> Contig2 30    3..12


If contig1 looks like below, I want to exclude the bases in bold by splitting the sequence and then create a new sequence with the bases on either side of the excluded region.

Input:

>Contig1
TTGTTCAACGGATCCACCT***GTTGCCAAGAGTGCTTCAGTACATTGCTCACGGCTGAA***TCCCATATCCATCAAAGCACAAGATTTGAATTCACTCGAGGATCTGCTTCGTCGACCATTGGAAATGAAAAAATTACAATTACACATTGAATTTGTAAAGCTTGAAATTAATGA


Output:

>Newcontig
TTGTTCAACGGATCCACCTTCCCATATCCATCAAAGCACAAGATTTGAATTCACTCGAGGATCTGCTTCGTCGACCATTGGAAATGAAAAAATTACAATTACACATTGAATTTGTAAAGCTTGAAATTAATGA


Most of the tools that i encountered remove the bases from trailing ends. Please let me know if you have any suggestions to specify the start:end and create a new sequence excluding that region.

I have about 100 contigs to clean this way. Any help would be highly appreciated. Thank you so much!

Contigs Assembly fastafile • 584 views
0
Entering edit mode

post is confusing to me without input sequences, format, input files and expected output. Could you please elaborate with example input and expected output? Thanks.

0
Entering edit mode

Hi, I have modified the post. Hope it makes sense now. Thanks.

0
Entering edit mode

Format of coordinates to be excluded is in fasta format. Is that correct?

0
Entering edit mode

Yes, that's right.