split all fasta sequences in a multi fasta file from half into two sequences
1
0
Entering edit mode
5.6 years ago

Hi i have a fasta file with many fasta sequences and I want to split all the fasta sequences into two half from middle.. I am new to bioinformatics kindly suggest some tool or perl command

sequence • 1.6k views
ADD COMMENT
1
Entering edit mode

You should first either use the biostars search utility to look for similar post or you can google like this to get answers from biostars.org.

on google.com : split fasta site:www.biostars.org

So, we recommend you to first look for similar post, try one of the answers mentioned by other users,and then post the issues that you may face adding specific details of the commands use.

We do not encourage providing direct answers. Show us your efforts and we will love to help.

ADD REPLY
0
Entering edit mode

I know how to split a big fasta files into multiple fasta files but my query is to bisect all fasta nucleotide sequences in a fasta file into two halves e.g.

>TC93917
GGCACGAGGCAGAAACCAATTTCAAAACATTATATAAATAGCTAGTTTCAGTACTAGCTG
TGCAACTCAATTATAGAACAATGGCTTCCTCTATGATCTCCTCTTCAGCTATCACTACAG
TTAACCGTGCCTCTCCGGTACAATCCGGTGCGGTGGCTCCATTCGTCGGACTCAAGTCCA
TGGCTGGCTTCCCAATTACAAAGGTCAACAAAGACATTACCTCCATTACAAGCAATGGTG
GAAGAGTAAACTGCATGCAGGTGTGGCCTCCTATTGGCAAGAAGAAGTTTGAGACTCTTT
CATATCTTCCACCATTGACCAGAGAACAATTGGCGAAAGAAGTTGAATACCTTATAAGGA
AGGGATGGGTTGCTTGCTTGGAATTCGAGACCGAGAAAGGATTTGTGTACCGTGAGAACC
ACAGTTCACCAGGATACTATGACGGACGTTACTGGACAATGTGGAAGTTGCCTTTGTTTG
GAGCAACTGATGCTTCTCAAGTGTTGAAGGAGCTTGATGAAGTTGTTGCTGCTTACCCTA
CTGCCTTTGTCCGTATCATCGGATTCGACAATGTTCGTCAAGTTCAATGCATCAGTTTCA
TTGCACACACACCTGATGTTTACTAAGTTCATTGCACTGGAATTTGGAAGAACTTTTTTC
TTCTTCCCATTTATGTTTTGCTTTTAATTTCCATTTCTTTTTCAGGGAAATGTTTTCCTT
CTGTGTTTTTATATTTCTGTTTTTGGATTTGAAAAATGGGATGTATAAGATTAAGAGTTA
ATGAATGAAATGGTTACTTAATTCCCAAAGTACTTAAAAGAATCCATTATCTATGTAGTT
TTCCTTGTTCTGC

into

>TC93917_1
GGCACGAGGCAGAAACCAATTTCAAAACATTATATAAATAGCTAGTTTCAGTACTAGCTG
TGCAACTCAATTATAGAACAATGGCTTCCTCTATGATCTCCTCTTCAGCTATCACTACAG
TTAACCGTGCCTCTCCGGTACAATCCGGTGCGGTGGCTCCATTCGTCGGACTCAAGTCCA
TGGCTGGCTTCCCAATTACAAAGGTCAACAAAGACATTACCTCCATTACAAGCAATGGTG
GAAGAGTAAACTGCATGCAGGTGTGGCCTCCTATTGGCAAGAAGAAGTTTGAGACTCTTT
CATATCTTCCACCATTGACCAGAGAACAATTGGCGAAAGAAGTTGAATACCTTATAAGGA
AGGGATGGGTTGCTTGCTTGGAATTCGAGACCGAGAAAGGATTTGTGTACCGTGAGAACC
ACAGTTC

>TC93917_2
ACCAGGATACTATGACGGACGTTACTGGACAATGTGGAAGTTGCCTTTGTTTG
GAGCAACTGATGCTTCTCAAGTGTTGAAGGAGCTTGATGAAGTTGTTGCTGCTTACCCTA
CTGCCTTTGTCCGTATCATCGGATTCGACAATGTTCGTCAAGTTCAATGCATCAGTTTCA
TTGCACACACACCTGATGTTTACTAAGTTCATTGCACTGGAATTTGGAAGAACTTTTTTC
TTCTTCCCATTTATGTTTTGCTTTTAATTTCCATTTCTTTTTCAGGGAAATGTTTTCCTT
CTGTGTTTTTATATTTCTGTTTTTGGATTTGAAAAATGGGATGTATAAGATTAAGAGTTA
ATGAATGAAATGGTTACTTAATTCCCAAAGTACTTAAAAGAATCCATTATCTATGTAGTT
TTCCTTGTTCTGC

sorry if i asked this simple thing but as i told i am new to bioinfo

ADD REPLY
0
Entering edit mode

Hello manishbiotechie,

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

I know how to split a big fasta files into multiple fasta files

You should explain this by adding your code here. May be people can suggest how you can edit / expand your code.

ADD REPLY
0
Entering edit mode

Why do you need to [literally] split the sequence in half? It should be possible to do with awk.

ADD REPLY
1
Entering edit mode
3.9 years ago
5heikki 11k

Assuming no linebreaks in sequences:

awk '{if(/^>/){H1=$0"_1";H2=$0"_2"}else{print H1"\n"substr($0,1,length($0)/2)"\n"H2"\n"substr($0,length($0)/2+1)}}' seq.fa
ADD COMMENT

Login before adding your answer.

Traffic: 1663 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6