Creating a Variant containing FASTA for proteomics search from VCF and genomic FASTA
0
0
Entering edit mode
12 months ago
chscho • 0

Dear Biostar Community

I'm currently trying to generate a protein FASTA containing all known variants from HeLa (from Cosmic CellLinesProject) for variant detection in proteomics measurements.

For this, I've downloaded the variants file (VCF) and the human genome FASTA and GFF or GTF from NCBI. My plan was to call the CDS of all genes from in the genome using the GTF and then apply the VCF to generate the corresponding of the variants from the VCF file. The newly generated nucleotide FASTA with entries for each wildtype gene and its variants would then be translated to a protein FASTA, which could ultimately be used for a proteomics experiment.

Unfortunatley, I am struggling to generate a nucleotide FASTA containing the Variants and the Wildtype versions of all CDS. So far I've tried to use GATK FastaAlternateReferenceMaker or BSgenome injectSNPs. But both solutions did not work as expected, since both tools only rely on the VCF)

Can someone point me to the correct workflow to successfully create such a protein FASTA?

Thank you very much! Best, chscho

translation Variant proteomics FASTA VCF • 569 views
ADD COMMENT
0
Entering edit mode

But both solutions did not work as expected, since both tools only rely on the VCF

I'm not sure I understand . Explain what should be your output : a fasta ? a list of mRNA ? etc...

ADD REPLY
0
Entering edit mode

Hi Pierre

The VCF file from Cosmic refers to the full human genome (GRCh38) and hence has one FASTA entry per chromosome. The extraction of the CDS (from GFF) and the introduction of the variants (from VCF) has to somehow happen at the same time (to properly map the VCF and extract the CDS) to be able to generate a FASTA with an entry for each CDS/protein, which then ultimately can be translated to a protein FASTA containing separate entries for each protein and also their variants. I hope I was able to clarify the anticipated workflow.

Best, chscho

ADD REPLY

Login before adding your answer.

Traffic: 1404 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6