How to create a database of genetic variants from multiple annotated VCFs and variants found in the literature?
0
0
Entering edit mode
3.5 years ago
aleksmllr • 0

Hello everyone! I have come to BioStars as I am very overwhelmed with the current task I have been assigned at my lab.

I have been given the task to create a database that contains genetic variants that have been called using several variant callers (freebayes, LoFreq, Samtools, and Strelka) for 20 patients (4 vcfs per patient) along with variants that our literature review team has found (stored in a large excel table).

Initially, my plan was to annotate all of the VCFs using annovar (to obtain the gene names/general info about our variants) and combine them into a large CSV file and append a column to this table specifying which variant caller a given variant arose from. From here I would import this table into a MySQL database along with the table that the literature review team generated.

I am not sure how I would link these tables in a database and I don't know if this is the best option or way to go about doing this. My supervisor wants the database to be functional in a sense that other researchers (non-technical) in our lab could query the database to look at specific variants of interest and what not.

I don't know if this is the best way to proceed and I am very concerned in my ability to complete such a task.

Also, I am curious if anyone has recommendations on how to filter variants in VCF files to ensure I'm looking at variants that are important/actual variants.

I have lots of Python and CLI experience but minimal SQL experience. If someone has done this type of thing before I would love to pick your brain and just ask some general questions.

Just looking for some guidance as I am completely lost.

Cheers,

Aleks

sequencing next-gen • 617 views
ADD COMMENT

Login before adding your answer.

Traffic: 1694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6