Error in reference fasta file when indexing with samtools
1
1
Entering edit mode
21 months ago
Saran ▴ 50

Hello,

I made the following reference fasta file (amplicons.fa) with only two sequences:

>BALV3loop
GTACAAGACCCAACAACAATACAAGAAAAAGTATAAATATAGGACCAGGCAGAGCATTTTATACAACAGGAGAAATAATAGGAGATATAAGACAAGCACATTGTAACCTTAGTAGAGCAAAATGGAATGACACTTTAAATAAGATAGTTATAAAATTAAGAGAACAATTTGGGAATAAAACAATAGTCTTTAAGCACTCCTCAGGAGGGGACCCAGAAATTG
>NL4-3V3loop
GTACAAGACCCAACAACAATACAAGAAAAAGTATCCGTATCCAGAGGGGACCAGGGAGAGCATTTGTTACAATAGGAAAAATAGGAAATATGAGACAAGCACATTGTAACATTAGTAGAGCAAAATGGAATGCCACTTTAAAACAGATAGCTAGCAAATTAAGAGAACAATTTGGAAATAATAAAACAATAATCTTTAAGCAATCCTCAGGAGGGGACCCAGAAATTG

I am trying to index the reference to run "alfred qc" on my aligned reads with the following command:

samtools faidx amplicons.fa

I get the following error:

[E::fai_build_core] Format error, unexpected character at line 1
[faidx] Could not build fai index amplicons.fai

I am not sure what could be wrong as the reference headers are named correctly according to my knowledge. Any help would be appreciated.

Sara

samtools RNA-seq • 2.1k views
ADD COMMENT
0
Entering edit mode

You're not the first person who has somehow ended up with UTF8 byte order marks in their input files. UTF8 actually recommends against using these, but clearly some Windows tool is "helpfully" creating text files with these invisible headers.

Incase we see this again, it would be useful to know which tool you used to create these files, so we can recommend people don't use it in the future!

ADD REPLY
0
Entering edit mode

I used LibreOffice Writer as I work in a Linux OS and saved the file as a txt file and then converted it to fasta using "mv". I will probably use vim next time.

ADD REPLY
1
Entering edit mode

I work on Linux too (Pop! OS), and yeah, I recommend not using a word processor like LibreOffice Writer to write code/scripts. If you want to use a GUI/IDE, I recommend VS Code, which I'm currently using.

ADD REPLY
1
Entering edit mode
21 months ago
ATpoint 85k

If I copy-paste this it works for me. My guess is some odd/hidden characters. Is this a file from Windows? dos2unix might be an option.

ADD COMMENT
0
Entering edit mode

I work in a Linux OS and created a txt file with sequences pasted above and then made "amplicons.fa" by the following:

mv amplicons.txt amplicons.fa

and I just tried again and am still getting the error above....Is there something wrong with the way I create the fasta file?

ADD REPLY
2
Entering edit mode

I agree with ATpoint that there are likely hidden characters introduced by (e.g.) your choice of text editor.

Try copying and pasting the following into your terminal and press enter:

cat << EOT >> amplicons_test.fa
>BALV3loop
GTACAAGACCCAACAACAATACAAGAAAAAGTATAAATATAGGACCAGGCAGAGCATTTTATACAACAGGAGAAATAATAGGAGATATAAGACAAGCACATTGTAACCTTAGTAGAGCAAAATGGAATGACACTTTAAATAAGATAGTTATAAAATTAAGAGAACAATTTGGGAATAAAACAATAGTCTTTAAGCACTCCTCAGGAGGGGACCCAGAAATTG
>NL4-3V3loop
GTACAAGACCCAACAACAATACAAGAAAAAGTATCCGTATCCAGAGGGGACCAGGGAGAGCATTTGTTACAATAGGAAAAATAGGAAATATGAGACAAGCACATTGTAACATTAGTAGAGCAAAATGGAATGCCACTTTAAAACAGATAGCTAGCAAATTAAGAGAACAATTTGGAAATAATAAAACAATAATCTTTAAGCAATCCTCAGGAGGGGACCCAGAAATTG
EOT

Then try: samtools faidx amplicons_test.fa

If it works, there must be something wrong with your original file.

ADD REPLY
1
Entering edit mode

you're right, there were hidden characters. I used the dos2unix command to fix the issue, thank you! I had never dealt with hidden characters before.

ADD REPLY

Login before adding your answer.

Traffic: 950 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6