Hello!
I''m trying to get a FASTA file from a BAM size. Essentially, I want to get the entire organism's genome into a FASTA sequence that reads like:
>one_line
ACCGCGG.... (only nucleotides; no more >)
When I asked a similar question, I was told that I had to make a consensus sequence. Basically, I just need to get the organism's genome in a way that's organized above.
I used a command that I found on this website. It was:
samtools bam2fq number1.bam | seqtk seq -A - > pop1.fa
It did successfully convert the file to a FASTA file, but it had multiple descriptions like:
>NB551191:275:HMT7LBGX7:1:11101:1614:1054 1:N:0:ATCACG
TAAATNAGATCATTTTTGTAGAGAAAAANGANGGCTTNCGAATGGTATGAAAATCTCTGTGATCCGTCAAAAACTGACTGAGTTCTGATAAAAAATGTATTGGCAGAAAATACCACTTGGACCAAATCTCAAAAATTGACGGAAATGTCAC
>NB551191:275:HMT7LBGX7:1:11101:18472:1054 1:N:0:ATCACG
TTTCCNGAAAACGCATCCAGCATTGTTTNACNTCATTNGAGAGCTGAAAATTTTCAAACCTGTATTTTCCAATCGCATAATAACTCGTGTCTCCTTCTCCATAATCCGTGGGAAGCTTTCAACTCAATAAATTTTAGGAAAAAAGTTTATT etc....
I only want the one description, and the rest of the file be the nucleotide description. Alternatively, it can be organized by each chromosome, but the most important part is that there should not be a new description every few lines.
If anybody has any advice on how I should do this, please let me know! Thank you in advance for your help!
I also think this link can help: Generating consensus sequence from bam file
But I don't know how to call the variants. I do have GATK on the computer that I am using, but I do not know how to use it.
Use this tutorial by @Finswimmer. It walks you through the entire process step by step.
Generating consensus sequence from bam file (this is a different post on biostars even though it has the same title as one you posted above)
Awesome, I will try it out. Thanks for the help!