velveth (fasta does not seem to be in FastA format)
Hey, I am trying to run velveth on my mac but keep getting this error message "F1.fasta does not seem to be in FastA format" which I have no idea why since F1.fasta is a fasta file. Here is my code. Anyone might have an idea what is going on?? both F1.fasta and F2.fasta are fasta files and the terminal directory has these two files.

velveth Assem 31 -fasta -shortPaired F1.fasta F2.fasta

Actually, I think I pasted something wrong. Here is the sequence head again. F1.fasta

>SRR041654.1 HWI-EAS284_61BKE:5:1:2:1334/1
NGCGAGGCTTCCATCAGTGAAATGTTTCCTTTCTGTTGTTGAAGTTTCATCTCAGCCAGAAGGCGCTCCAACGAAGTTATTTCTTTTTCATAAACAGCCA
>SRR041654.1 HWI-EAS284_61BKE:5:1:2:1334/2
ACAAGGAAATGGCCGGATATCAGTTTCAGGAAATCATGCGCACCTTGCATAGTGAGCTGAACGAACGATTTGTCGAGACTTATTTTCTGACTAGAAATAT
>SRR041654.2 HWI-EAS284_61BKE:5:1:2:1511/1
NGAAGCGTGACAAAATCACGTACAATACTCAGACTACCTCCGCCACCTGAGAAGCTCATATCCGGATAATCCACTTGATATAAATGTCCGAAAATGCGTT
>SRR041654.2 HWI-EAS284_61BKE:5:1:2:1511/2
TCGGGAAATGCTGGAAATCAGAGTGGCTGATACAGGGATCGGAATTAAAAAAGAAGACAGAGAACGCATTTTTGGACATTTTTATCAAGTGGTTTATCCT
>SRR041654.3 HWI-EAS284_61BKE:5:1:2:1671/1
NGGTGTGTCTGTATTGCTGTCTGCCGTAACGGTAATTTTCCTGATTTCGGCAACTATCATTGTTTTTACTCCTTTACGTAATTATTTGCCGGGATATATG


F2.fasta

>SRR041655.1 HWI-EAS284_61BKE:6:1:2:1735/1
NAAATCAGACAAATCTCCGTTATTGGTATATACTTTGGGAGTGTTATGGAATTGCACACCCATTTCGAACATGAAGCCAATTCGTTTCTTAGGAATCGCT
>SRR041655.1 HWI-EAS284_61BKE:6:1:2:1735/2
GAAATCGGAAACTATCGTATACCTGTAGATCAGAACGGAAATATATCTGGTGGTTTGAAGGTTTCTTCATTCCGTCCTTATCTTGGACTAGGCTTCGGAA
>SRR041655.2 HWI-EAS284_61BKE:6:1:2:1264/1
NATGTAGCATTAAAAATTACATCCTAAACTTATCGATAAATGAGTACGCCCATCATAATCATAGTCAGAGGTATTTACACGATCGAATACAACTTTTGCA
>SRR041655.2 HWI-EAS284_61BKE:6:1:2:1264/2
TTACAACAAGGGCTGACTAATTATTATACCTGTGATTACTATCGTATTGGCGGGGCGATAAAGGATTTGCAAAAAAAAAAAGAAAAAAGAAGAAAGAGAA
>SRR041655.3 HWI-EAS284_61BKE:6:1:2:1293/1
NACAAGCTGATTAAGCCTATAAATAAGACCTTTATTTTCCCCATCTGAAATAACTCGAATCCTCCTATCAGTTGCATAACTTAAAGCAATTTCTAAGGAA

Do the following

cat  F1.fasta | grep ">"  | wc -l


And the same with the other file, and give us the numbers. This counts the number of reads, and will see if both files have the same number

Hey, do you think it has something to do with the type of text file. I have a MAC and my computer indicate the kind of file is "TextEdit Document". I suspect it can be a problem since this code works fine for me before. But between last time I used it and now, I think I downloaded some text file apps on my computer and it changes the kind of text file....

Its possible you've gotten the wrong formatting for the endlines somehow.

If you do:

cat -v F1.fasta


Do you see ^M anywhere (at the end of the lines)?

I tried but did not see ^M.

If you try to run something like

velveth myOutputdirectory/ 31 -fasta -shortPaired F1.fasta F2.fasta


Do you still get the same problem?

Do you think it is the N messing it up? if so, It is weird since they are the tutorial sequences.

 > head F1.fasta
> >SRR041654.1 HWI-EAS284_61BKE:5:1:2:1334/1 NGCGAGGCTTCCATCAGTGAAATGTTTCCTTTCTGTTGTTGAAGTTTCATCTCAGCCAGAAGGCGCTCCAACGAAGTTATTTCTTTTTCATAAACAGCCA
> >SRR041654.1 HWI-EAS284_61BKE:5:1:2:1334/2 ACAAGGAAATGGCCGGATATCAGTTTCAGGAAATCATGCGCACCTTGCATAGTGAGCTGAACGAACGATTTGTCGAGACTTATTTTCTGACTAGAAATAT
> >SRR041654.2 HWI-EAS284_61BKE:5:1:2:1511/1 NGAAGCGTGACAAAATCACGTACAATACTCAGACTACCTCCGCCACCTGAGAAGCTCATATCCGGATAATCCACTTGATATAAATGTCCGAAAATGCGTT
> >SRR041654.2 HWI-EAS284_61BKE:5:1:2:1511/2 TCGGGAAATGCTGGAAATCAGAGTGGCTGATACAGGGATCGGAATTAAAAAAGAAGACAGAGAACGCATTTTTGGACATTTTTATCAAGTGGTTTATCCT
> >SRR041654.3 HWI-EAS284_61BKE:5:1:2:1671/1 NGGTGTGTCTGTATTGCTGTCTGCCGTAACGGTAATTTTCCTGATTTCGGCAACTATCATTGTTTTTACTCCTTTACGTAATTATTTGCCGGGATATATG
>SRR041655.1 HWI-EAS284_61BKE:6:1:2:1735/1
NAAATCAGACAAATCTCCGTTATTGGTATATACTTTGGGAGTGTTATGGAATTGCACACCCATTTCGAACATGAAGCCAATTCGTTTCTTAGGAATCGCT
>SRR041655.1 HWI-EAS284_61BKE:6:1:2:1735/2
GAAATCGGAAACTATCGTATACCTGTAGATCAGAACGGAAATATATCTGGTGGTTTGAAGGTTTCTTCATTCCGTCCTTATCTTGGACTAGGCTTCGGAA
>SRR041655.2 HWI-EAS284_61BKE:6:1:2:1264/1
NATGTAGCATTAAAAATTACATCCTAAACTTATCGATAAATGAGTACGCCCATCATAATCATAGTCAGAGGTATTTACACGATCGAATACAACTTTTGCA
>SRR041655.2 HWI-EAS284_61BKE:6:1:2:1264/2
TTACAACAAGGGCTGACTAATTATTATACCTGTGATTACTATCGTATTGGCGGGGCGATAAAGGATTTGCAAAAAAAAAAAGAAAAAAGAAGAAAGAGAA
>SRR041655.3 HWI-EAS284_61BKE:6:1:2:1293/1
NACAAGCTGATTAAGCCTATAAATAAGACCTTTATTTTCCCCATCTGAAATAACTCGAATCCTCCTATCAGTTGCATAACTTAAAGCAATTTCTAAGGAA

F1.fasta looks very strange, while F2.fasta looks OK. Is this just formatting from this system or does F1.fasta really have all sequences in one line?

The first sequence is wrong. You have two ">" at the beginning, and you should have only one. In addition, after the >"name" and the comment (the word starting with HWI..) you must have a carriage return to show the fasta sequence in a second lane

In other words. You must get exactly waht you see in F2.fasta

Run this code to fix it

$cat F1.fasta | awk '{print$2 " " $3 "\n"$4}' > F1_fixed.fasta


Notice the empty spaces. Then, after checking the file, erase the wrong one and rename F1_fixed.fasta as the original, or use that name in velvet directly

You can also run it in this way

$awk '{print$2 " " $3 "\n"$4}' F1.fasta > F1_fixed.fasta


In both cases, you obtain the correct fasta file

If you notice that the double ">" is not in the original file, take into account that print $2 means print the second word,$3, the third, etc, and then change the code in accordance. "\n" means a carriage return

So I think both of them is ok but I will wonder if the error is due to the "N" in the fasta file. I made some mistake in my previous post. F1.fasta is like this

>SRR041654.1 HWI-EAS284_61BKE:5:1:2:1334/1
NGCGAGGCTTCCATCAGTGAAATGTTTCCTTTCTGTTGTTGAAGTTTCATCTCAGCCAGAAGGCGCTCCAACGAAGTTATTTCTTTTTCATAAACAGCCA
>SRR041654.1 HWI-EAS284_61BKE:5:1:2:1334/2
ACAAGGAAATGGCCGGATATCAGTTTCAGGAAATCATGCGCACCTTGCATAGTGAGCTGAACGAACGATTTGTCGAGACTTATTTTCTGACTAGAAATAT
>SRR041654.2 HWI-EAS284_61BKE:5:1:2:1511/1
NGAAGCGTGACAAAATCACGTACAATACTCAGACTACCTCCGCCACCTGAGAAGCTCATATCCGGATAATCCACTTGATATAAATGTCCGAAAATGCGTT
>SRR041654.2 HWI-EAS284_61BKE:5:1:2:1511/2
TCGGGAAATGCTGGAAATCAGAGTGGCTGATACAGGGATCGGAATTAAAAAAGAAGACAGAGAACGCATTTTTGGACATTTTTATCAAGTGGTTTATCCT
>SRR041654.3 HWI-EAS284_61BKE:5:1:2:1671/1
NGGTGTGTCTGTATTGCTGTCTGCCGTAACGGTAATTTTCCTGATTTCGGCAACTATCATTGTTTTTACTCCTTTACGTAATTATTTGCCGGGATATATG


and F2.fasta is like this

>SRR041655.1 HWI-EAS284_61BKE:6:1:2:1735/1
NAAATCAGACAAATCTCCGTTATTGGTATATACTTTGGGAGTGTTATGGAATTGCACACCCATTTCGAACATGAAGCCAATTCGTTTCTTAGGAATCGCT
>SRR041655.1 HWI-EAS284_61BKE:6:1:2:1735/2
GAAATCGGAAACTATCGTATACCTGTAGATCAGAACGGAAATATATCTGGTGGTTTGAAGGTTTCTTCATTCCGTCCTTATCTTGGACTAGGCTTCGGAA
>SRR041655.2 HWI-EAS284_61BKE:6:1:2:1264/1
NATGTAGCATTAAAAATTACATCCTAAACTTATCGATAAATGAGTACGCCCATCATAATCATAGTCAGAGGTATTTACACGATCGAATACAACTTTTGCA
>SRR041655.2 HWI-EAS284_61BKE:6:1:2:1264/2
TTACAACAAGGGCTGACTAATTATTATACCTGTGATTACTATCGTATTGGCGGGGCGATAAAGGATTTGCAAAAAAAAAAAGAAAAAAGAAGAAAGAGAA
>SRR041655.3 HWI-EAS284_61BKE:6:1:2:1293/1
NACAAGCTGATTAAGCCTATAAATAAGACCTTTATTTTCCCCATCTGAAATAACTCGAATCCTCCTATCAGTTGCATAACTTAAAGCAATTTCTAAGGAA

The presence of N is not an issue..

velveth Assem 31 -fasta -shortPaired test1.fasta test2.fasta


works perfectly fine for me with those two sequences. Check the end of your files with tail, maybe there's some error hiding there.