Question: Some questions about write human mitochondrial variants into VCF file
0
gravatar for MatthewP
4 months ago by
MatthewP20
China
MatthewP20 wrote:

Hello, I have a variants result of mtDNA sequencing. Here is my result like:

SampleID        Pos     Ref     Variant Major/Minor     Variant-Level   Coverage-FWD    Coverage-Rev    Coverage-Total
R07058.bam      9090    T       C       C/A     0.9974  2100    2136    4236

This result comes from mtDNA-Server. Major means major nucleotide at 1 site, minor means opposite. Variant-Level seems to mean the ratio of variants, but I am not sure about that.

I want to annotate those variants by using snpEff which needs input VCF file, so I try to write a python script to convert this to VCF format file. I already read VCF format required before I started.

Considering that mitochondrial is haploid I separate each variant of same site as different variants in VCF. In this example it would be 2 lines in VCF:

#CHROM  POS     ID      REF     ALT ...
MT      9090    .       T       A ...
MT      9090    .       T       C ...

I hope this solution is right.

My questions is about INFO columns in VCF. mtDNA is haploid however it may have many(unknow) copies in cell, I don't know how to fill this tag in INFO:

  1. AC : allele count in genotypes, for each ALT allele, in the same order as listed.
  2. AN : total number of alleles in called genotypes.
  3. DP : combined depth across samples, e.g. DP=154. I know it would be many depth values because more than 1 sample will be put in 1 line in VCF. But I don't know what is combined depth across them and how to calculate.

Any help is appreciate.

mtdna vcf • 328 views
ADD COMMENTlink modified 4 months ago by Pierre Lindenbaum114k • written 4 months ago by MatthewP20
1

Hello MatthweP,

could you please describe what the columns Major/Minor and Variant-Level are for? Why do you need a vcf file?

Also it is better to use the code button in the formatting bar to show file contents. I've done it for you this time.

code_formatting

fin swimmer

ADD REPLYlink modified 4 months ago • written 4 months ago by finswimmer6.7k

Thanks for your advice, I have re-edit this question and explain Major/Minor means.

ADD REPLYlink written 4 months ago by MatthewP20

Hello MatthewP,

thank you for adding information to your question. But I still doesn't understand what is meant by Major/Minor? Because in the Variant column there is only a C.

Also it is necessary to understand why you need a vcf file. In the easiest case your vcf file just need values in the CHROM, POS, REF and ALT column. All other mandatory fields can be filled with . if these information aren't needed for downstream analyses.

fin swimmer

ADD REPLYlink written 4 months ago by finswimmer6.7k

Majo/minor is a column that is included in the results generated from mtDNA server. it creates two profiles based on variant allele frequencies - major and minor and this info is used to perform haplogroup checks for each heteroplasmic site

ADD REPLYlink written 4 months ago by Nandini700

Thank you Nandini! Can I ask where you get all this information about mtDNA server? There is no detail document on github project. Actually I have to guess all those tags means.

ADD REPLYlink written 4 months ago by MatthewP20

Hi Matthew, I've used mtDNA server before setting up my own pipeline for our lab. Have you read the paper for the tool ? It should be given in that.

ADD REPLYlink written 4 months ago by Nandini700

Yep, I read the paper before I download this tool. I also want to set my own pipeline, but I don't know how for I am just a beginner of bioinformatics. How do you do the variant call job? Do you have some guidance for me about building this mtDNA pipeline?

ADD REPLYlink written 4 months ago by MatthewP20

Sure, I can help you with that but it would be useful to know what is the aim of your project ? what samples are you analysing ? Why do you need to convert the results into vcf format ? do you only need to call variants or do you need to perform further downstream analysis ?

ADD REPLYlink written 4 months ago by Nandini700

Thank you! Can I have your e-mail address? I will send e-mail to discuss with you.

ADD REPLYlink written 4 months ago by MatthewP20
1

Please don't ask for email addresses. We like to keep the discussion open and on the forum so it benefits everyone.

ADD REPLYlink written 4 months ago by genomax58k

Well, I work in a company offering sequencing service. This is our company first time received mtDNA order. Our client want us to analyse heterogeneous of mtDNA(variants) and copy number variants(CNV). They are using multi-PCR to obtain mtDNA library, so I think we can't get CNV from such data, there is no nuclear genome to normalize between samples. I want to offer them very good variants report. Here is my pipeline requirements:

  1. QC control and mapping. I currently using bwa to do mapping job, but confusing using which reference, I currently using rCRS recommend by rCRS vs. RSRS vs. HG19 (Yoruba).. Is rCRS the same with chrMT of GRCH38 or HG38? (If i use whole human genome as reference some reads will mapped to other chromosome especially chr2)
  2. Variant Calling. I totally no idea about it.
  3. Annotation, snpEff seems good to me. Any other suggestions?
  4. If possible, I want to give some biological or medical analyse of those variants, for example some SNP may causing some disease. I am trying to find some database may be useful on MITOMAP . I never done such job before, maybe I need some tools beside all those database?

Detail about sequencing method: Library construction using MultipSeqTm AImumiCap Panel which use 129 paired primers to PCR whole mtDNA.

ADD REPLYlink written 4 months ago by MatthewP20

There are several publications and automated pipelines that does this for you but as you work for a company, you need to see if these softwares are freely available for you to use.

So my pipeline for mtDNA analysis is as follows

1.Mapping: BWA with rCRS (hg19)

  1. Mark duplicates with Picard

  2. Variant calling: samtools and varscan

  3. Variant annotation: annovar

  4. Additional annotation: Mitomap

Hope this helps. Good luck

ADD REPLYlink modified 4 months ago • written 4 months ago by Nandini700

We like to set our own pipelines so it's easy to maintains and upgrade. Thank you very much I will try your pipeline.

ADD REPLYlink written 4 months ago by MatthewP20

Okay. But definitely do some research before implementing the pipeline as some of the tools may or may not suit your requirements

ADD REPLYlink written 4 months ago by Nandini700

Ok, I need to annotate those variants using snpEff which input VCF file.

ADD REPLYlink written 4 months ago by MatthewP20
2
gravatar for Pierre Lindenbaum
4 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum114k wrote:

I hope this solution is right.

no, in a valid VCF you should find only one CHROM/POS/REF. See the VCF spec, for example for the attribute associated to the ALT allele (e.g AF, Number='A'), you should find the same number of data than the number of ALT allele. Example:

##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
#CHROM  POS     ID      REF     ALT ... INFO
MT      9090    .       T       A,C ... AN=100;AC=1,50;AF=0.01,0.5
ADD COMMENTlink modified 4 months ago • written 4 months ago by Pierre Lindenbaum114k

Thanks, I will check VCF protocol again! However I still don't know how to decide the AC and AN values, because I don't know the copy number of mtDNA. If one of the variant is deletion, should it also be same line with SNP? Like:

#CHROM  POS     ID      REF     ALT ... 
MT    9090    .    AT    A, AC ...

Am i understanding this right?

ADD REPLYlink written 4 months ago by MatthewP20

There are vcf validators. Try one of them.

ADD REPLYlink written 4 months ago by cpad01129.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1639 users visited in the last hour