Fix Fasta Headers for BUSCO
1
0
Entering edit mode
18 months ago
hpalk42 • 0

I aligned my paired end genome reads to a reference genome sequence using the BWA software, and it gave me a .sam file. I used samtools SAM to FASTA to convert the aligned reads to fasta file. I want to look at assembly statistics and also evaluate completeness with BUSCO. I received the following error:

The character "/" is present in the fasta header >A00600:204:HFMJ3DSX3:3:1101:3640:1125/1, which will crash Reader. Please clean the header of your input file.

when I do head -8 aligned.fasta this is what I get:

>A00600:204:HFMJ3DSX3:3:1101:3640:1125/1
TTTTATTTGAAATCACAAACCACTAACAACGATACAAAACGTCAATATATTCCCAAATTCGATGATTTTTTCTTCAAATCATGATGCGAGATTTTCTGGTATTTAGATCAATAAAATGCATATGAAAAGGTTGTTTCCAATACAAATGATC

>A00600:204:HFMJ3DSX3:3:1101:3640:1125/2
GGAATAGAGCGGTTTTACGTGTACATCTTCTTTCGGTAAAATAAACGGAATATCAAGTGAATTCTAATGAAAACACATTGATTTTGATCATTTGTATTGGAAACAACCTTTTCATATGCATTTTAATGATCTAAATACCAGGAAATCTCGC

>A00600:204:HFMJ3DSX3:3:1101:3658:1125/1
CCGAGTATTTTTGCCAGAATTGTTTTTCATATGTTGCCCAAACCCCCCTGCAGTTTTCCCAATATGGGAAATATCGCAATTCTGACTCGTGAGCGTTGTTTTGGAGCCGAATTGTGACTCCCTGCACATATTTGGAGTTTGAAATTCCGAG

>A00600:204:HFMJ3DSX3:3:1101:3658:1125/2
GGTTTGGGCAACATATGAAAAAAAATTCTGGCAAAATTCTCGGAATTTCAAACTCCAAATATGTGCAGGGAGTCACAATTCGGCTCCAAAACAACGCTCACGAGTCAGAATTGCGATATTTCCCATATTGGGAAAACTGCAGGGGGGTTTG

How can I clean up the fasta header so that BUSCO can run its analysis? Thank you for your help!

Fasta BWA BUSCO • 590 views
ADD COMMENT
1
Entering edit mode
18 months ago
GenoMax 142k

You could try replacing / with _:

$ sed 's/\//_/g' test.fa
>A00600:204:HFMJ3DSX3:3:1101:3640:1125_1
TTTTATTTGAAATCACAAACCACTAACAACGATACAAAACGTCAATATATTCCCAAATTCGATGATTTTTTCTTCAAATCATGATGCGAGATTTTCTGGTATTTAGATCAATAAAATGCATATGAAAAGGTTGTTTCCAATACAAATGATC
>A00600:204:HFMJ3DSX3:3:1101:3640:1125_2
GGAATAGAGCGGTTTTACGTGTACATCTTCTTTCGGTAAAATAAACGGAATATCAAGTGAATTCTAATGAAAACACATTGATTTTGATCATTTGTATTGGAAACAACCTTTTCATATGCATTTTAATGATCTAAATACCAGGAAATCTCGC
>A00600:204:HFMJ3DSX3:3:1101:3658:1125_1
CCGAGTATTTTTGCCAGAATTGTTTTTCATATGTTGCCCAAACCCCCCTGCAGTTTTCCCAATATGGGAAATATCGCAATTCTGACTCGTGAGCGTTGTTTTGGAGCCGAATTGTGACTCCCTGCACATATTTGGAGTTTGAAATTCCGAG
ADD COMMENT

Login before adding your answer.

Traffic: 2160 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6