I would like to create a new fasta file from the original genome fasta and a vcf file. The fasta file will only have full gene sequences included.
I can use the gatk FastaAlternateReferenceMaker to accomplish this:
java -jar -Xmx16g ~/bin/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar -T FastaAlternateReferenceMaker -R ref_genome.fasta -o sample_SNV.fasta -V sample_SNV_selected.vcf -L ref_gene.bed
But I would like the output fasta to have the gene names as the header. For instance the current fasta output from gatk is:
>1 chr01:2350 AGAAAGGACAGAAAAAAAGATGGTGAAGTAGAAAGAGGGCGAAATGAAAAAAGGGAAAGC AAAAGAGATGATGAAAGTCATAGAGAGAGAGATGAAAAAAGGGAAAGCAAAAGAGATGAT
I would like the output to 1) not have a sequential numerical output and 2) to contain the gene name from column 4 of the .bed file.
Is there a way to either modify 1) the input bed file or 2) the output fasta file by giving 'some tool' the fasta and the bed file?