Question: difference in genotypes in a multisample vcf file
0
gravatar for prasundutta87
4.0 years ago by
prasundutta87390
prasundutta87390 wrote:

Hi,

I have two individuals of the same animal (one male and one female water buffalo). I have RNAseq data from both the animals and I used mpileup and bcftools 'call' programs for variant calling. My aim is to find out allelic imbalance in a set of genes of interest.

Let me take an example of one variant in the resuted VCF file. For a particular variant ,one individual is heterozygous and the other is homozygous.

Since, allelic imbalance can only be found out through a heterozygous SNP, should I go ahead with the variants that are heterozygous in both the individuals? Or should I go ahead with the variants that are atleast heterozygous in at least one individual.

Thanks. PS- I am new to NGS and doing variant analysis for the first time

rna-seq snp next-gen • 1.3k views
ADD COMMENTlink modified 3.9 years ago • written 4.0 years ago by prasundutta87390
2
gravatar for Petr Ponomarenko
4.0 years ago by
United States / Los Angeles / ALAPY.com
Petr Ponomarenko2.7k wrote:

At least one heterozygous. It will be easier if you merge vcf files into a singlemultisample file with vcftools.

vcf-merge A.vcf.gz B.vcf.gz | bgzip -c > out.vcf.gz

With such single vcf you are interested in all lines that are different in genotype GT field and maybe you want to filter by quality of the call criteria.

Be careful with variant calling and comparison with sex chromosomes. Keep in mind that vcf usually has only mutation data unless you prepared data to go otherwise. So you most likely do not have homozygous reference calls there, but that does not mean a call is homozygous reference, because you may have very low coverage. If you have alignment files than you can use

samtools bedcov regions.bed aln.sorted.bam

Where regions.bed is regions to calculate coverage and aln.sorted.bam is your sorted alignment file.

In case it is not sorted you can sort and index it like this

samtools sort -T tmp.aln -o aln.sorted.bam aln.bam
samtoold index aln.sorted.bam

Another option if you have alignment files is to do variant calling on multiple files simultaneously. That way you will generate multisample vcf file and will have coverage data for potentially homozygous reference and low coverage positions in it already.

ADD COMMENTlink written 4.0 years ago by Petr Ponomarenko2.7k
0
gravatar for prasundutta87
3.9 years ago by
prasundutta87390
prasundutta87390 wrote:

Thanks for your reply Petr. Just to add, I did call the SNPs using both individual's data (alignment file) simultaneously. Also, both of my alignment files are sorted and indexed before I used them for calling variants. The resulted VCF file is a multi-sample VCF file. Hence, I have genotype data side by side for both the individuals and that's where my question arose. I am aware that one individual can be homozygous and another heterozygous for one variant, which is due to diversity. I used this command for calling variants:

samtools mpileup -f reference.fa -g individual_1.bam .indivudual_2.bam|bcftools call -mv > Final_result.vcf

For filtering, I used default varFilter default parameters first, followed by variant annotation using SnpEff and then further filtering of variants using SnpSift using the following parameters: QUAL > 20, MQ >= 30, DP >= 10 (parameters range selected from various published literatures). I also filtered out the variants that was null in either of the individuals.

If you could kindly comment on the procedure, I can go ahead with selecting one variant that is heterozygous in at least one individual.

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by prasundutta87390

This is a good starting point. You might want to change tools and filter parameters later to increase sensitivity

ADD REPLYlink written 3.9 years ago by Petr Ponomarenko2.7k

Thanks a lot..am trying out GATK as well although the tools take too much time to run.

ADD REPLYlink written 3.9 years ago by prasundutta87390

Please use ADD COMMENT to reply to earlier answers, as such this thread remains logically structured and easy to follow.

ADD REPLYlink written 3.9 years ago by WouterDeCoster45k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1680 users visited in the last hour
_