comparing vcf files of disease and normal samples
3
1
Entering edit mode
10.1 years ago

Hi ,

I have two VCF files that belongs to disease samples and four VCF files that generated from normal samples. Now I would to filter of SNPs . I would like to get the SNPs that are present in only disease group so that I can go further to validate them by other methods. Is there any tool that does this kind of analysis?

VCF SNP • 4.5k views
ADD COMMENT
0
Entering edit mode
10.1 years ago
Katie D'Aco ★ 1.1k

I would start with vcftools merge command and then write a script to remove variants that are present in controls. Maybe there is an existing tool that will filter by genotype call in certain samples, but I am not aware of one.

If the disease you're looking at is rare, I would also consider filtering by 1000 genomes or NHLBI allele frequencies.

ADD COMMENT
0
Entering edit mode

Thank you for the reply. I have used multisample SNP calling ( UnifiedGenotyper ) and filtered SNPs and selected SNPs with AWK.

ADD REPLY
0
Entering edit mode

hello geek_y - I realise this is a very old post but please can you explain how you used AWK to select SNPs?

ADD REPLY
0
Entering edit mode
10.1 years ago
Vivek ★ 2.7k

You can start with GATK CombineVariants to merge all the VCFs at once.

Then use GATK SelectVariants to filter out variants that occur in only the two disease samples.

ADD COMMENT
0
Entering edit mode

Thank you for the reply. I have used multisample SNP calling ( UnifiedGenotyper ) and filtered SNPs and selected SNPs with AWK.

ADD REPLY
0
Entering edit mode
8.9 years ago

Hi Goutham,

My requirement is also some what similar to yours. I have to find snps which are present in affected but not in unaffected samples, then I have to prepare a list For example,

Samples           4:1243-SNV     5:1277-SNV       15:4070-SNV    ..... ..... ...... ...... ..... .....      16:5335-SNV 
A(affected)          C_T                  A_G                 A_T               ..... ..... ...... ...... ..... .....        A_C
B(unaffected)      C_C                  A_A                  A_A               ..... ..... ...... ...... ..... .....       A_A
C(affected)          C_T                  A_G                 A_T               ..... ..... ...... ...... ..... .....        A_C
D(affected)          C_T                  A_G                 A_T              ..... ..... ...... ...... ..... .....         A_C

Any help :)

Cheers,
Ram

ADD COMMENT

Login before adding your answer.

Traffic: 3033 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6