I am working with a triple negative breast cancer cell line and am interested in using associated SNV data. This is my first time working with such data, so I wanted to see if I can get any guidance.
First, is COSMIC for the most part unique compared to dbSNP? There are some SNVs that have COSMIC Ids, but no corresponding dbSNP ID.
Second, I want to make sure that I am being accurate when processing VCF files with VEP. It appears that consistent genome assembly is crucial to accurate results. For example, a specific TP53 mutation with the same transcript ID occurs at 17:7577099..7577099 in GRCh37 but 17:7673781..7673781 in GRCh38.
Lastly, I have used the ProteinSeqs plugin with VEP to identify mutated and reference protein sequences. However, for many entries, the "mutated" protein sequence exactly matches the reference.