Dear Biostar Community
I'm currently trying to generate a protein FASTA containing all known variants from HeLa (from Cosmic CellLinesProject) for variant detection in proteomics measurements.
For this, I've downloaded the variants file (VCF) and the human genome FASTA and GFF or GTF from NCBI. My plan was to call the CDS of all genes from in the genome using the GTF and then apply the VCF to generate the corresponding of the variants from the VCF file. The newly generated nucleotide FASTA with entries for each wildtype gene and its variants would then be translated to a protein FASTA, which could ultimately be used for a proteomics experiment.
Unfortunatley, I am struggling to generate a nucleotide FASTA containing the Variants and the Wildtype versions of all CDS. So far I've tried to use GATK FastaAlternateReferenceMaker or BSgenome injectSNPs. But both solutions did not work as expected, since both tools only rely on the VCF)
Can someone point me to the correct workflow to successfully create such a protein FASTA?
Thank you very much! Best, chscho