Question: Comparing VCF files between two groups (15 vcf files against 15 vcf files)
1
gravatar for Pin.Bioinf
4 months ago by
Pin.Bioinf240
Malaga
Pin.Bioinf240 wrote:

Hello,

I have 15 vcf files for one type of population and 15 vcf files for another type. I want to check the differences between the two, and also the similarities. What changes from one group to another and what remains the same, and a signifcance score if possible.

I have read about PLINK but I am not sure how the pipeline should be. Which steps should I folllow? I read the documentation and it is not clear to me.

I also read about bcftools isec: which is useful to intersect multiple vcf files. So I could merge the 15 vcf files between them and the other 15 vcf files between them and end up with two files: population1_variants.vcf and population2_variants.vcf, and then compare those two against eachother and check for the differences and similarities?

Which approach is better? Is this the way people usually analyze variants among populations? How can I asess significance of the results? Are there any other approaches?

Thank you

variants snp plink vcf • 211 views
ADD COMMENTlink modified 4 months ago by Raony Guimarães980 • written 4 months ago by Pin.Bioinf240
2
gravatar for Raony Guimarães
4 months ago by
Dublin / Ireland
Raony Guimarães980 wrote:

It really depends on what you want to achieve with this comparison. You could merge all VCFs and do an association analysis between the two populations using plink to find differences between the two groups or you could do a PCA using all samples to see if the two populations have a clear separation between them.

Try doing an association analysis:

plink --file mydata --assoc

Look for SNPs with statistical significance between the two groups.

http://zzz.bwh.harvard.edu/plink/anal.shtml

ADD COMMENTlink written 4 months ago by Raony Guimarães980

Thank you! This seems like a nice approach, and what I was looking for. Would the mydata input be the merged 15samplescase.vcf and 15samplescontrol.vcf ? And those vcf merged should contain only the common variations among each of the 15 samples ?

Thank you

ADD REPLYlink written 4 months ago by Pin.Bioinf240
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1150 users visited in the last hour