Question: Pipeline for SNP analysis
gravatar for rebecca08238
12 months ago by
rebecca0823810 wrote:

Hello! I am working on Nephrotic syndrome patients. I have to check SNPs which are link with Nephrotic syndrome. study Involves 2 groups: Control, Nephrotic syndrome

Work done till now:

We used DNA Ampliseq illumina Library kit.

Source:Genomic DNA

Let me tell you in brief what I did after getting raw data from Illumina Miseq. We used samtool pipeline

  • Reference genome used: hg19
  • 1st step: Mapping was done by using BWA mem
  • 2nd step: Conversion of Sam to Bam file (using a fixmate command)
  • 3rd step Bam sorting& Bam indexing
  • 4th step: Variant calling using bcftools (v1.9)
  • We created 2 groups files: group1: Control 30 individuals and group 2: syndromic patients total 60 individuals in second group.We used BCF isec command, is this a correct way to generate vcf files or we have to make vcf files for each sample of a particular group and then combining all samples groupwise?
  • So now we have two vcf files: group1.vcf & group2.vcf, I used SnpEff for annotation of these two groups’ files.
  • I observed that some of the SNPs are present in the control group and syndromic group too, so should I remove those SNPs which are common in both groups using bedtools subtract command? Please let me know that I am going in a proper way or not.
  • Even I tried to create individual patients vcf files total 90 Samples (30 control and 60 syndromic. But after this I am stuck what to do. I used bcf isec command to combine all snp of 1 group into single vcf file but it’s taking only few SNPs my why it’s like that? Can you please help me which command I should use to combine all snp of one group.
  • Can you please let me know If you had any idea that how we can say that this or that particular SNP responsible for particular disease.

Thank you !

snp bcftools • 608 views
ADD COMMENTlink modified 12 months ago by ATpoint36k • written 12 months ago by rebecca0823810

Hi rebecca08238, thanks for the details, it is good to provide background to understand the question. I suggest you add the command for the problematic isec command and the command for the initial variant calling. The experienced bcftools folks will then probably see right away where the issue is.

ADD REPLYlink written 12 months ago by ATpoint36k

Why did you use hg19 ? It's an old, poor genome which has been superseded by both GRCh37 (added baits, better SNV calling) and GRCh38 (better baits/Alts), updated chromosomes. Many groups have seen improved SNV results using more up to date genomes. Might be a point for remapping in the future.

More details:

ADD REPLYlink modified 12 months ago • written 12 months ago by colindaven2.3k

Yes, I am not sure that bcftools isec is the correct choice - each time that I use bcftools isec, I am reminded of why I stopped using it previously.

Whether or not to filter out variants also present in controls is your choice. If your hypothesis is that there is a single variant of high penetrance that causes Nephrotic Syndrome, then it may help to filter out any variant found in the controls. If your hypothesis is that the syndrome has complex pathophysiology and complex patterns of inheritance, then variants in controls are to be expected.

Is Nephrotic Syndrome not caused by variants in NPHS1 or NPHS2?

Previously, when I did this, I merged all samples into the same VCF and then imported the VCF to PLINK, where I generated association test results.

Technically, you can also have 2 VCFs for controls and patients, and then compare these by using a key, like CHR:POS:REF:ALT, and using awk or even grep (or R / Python).

ADD REPLYlink modified 12 months ago • written 12 months ago by Kevin Blighe63k

Thank you so much for your reply! yes NPHS1$NPHS2 are responsible for Nephrotic syndrome but there are some other genes also which are invloved in Nephrotic syndrome.

which pacakage we should use for SNP analysis in R?

ADD REPLYlink written 12 months ago by rebecca0823810

which pacakage we should use for SNP analysis in R?

What do you mean? - you have not really explained what you want to do (?). So far, it seems that you just want to filter variants based on their frequencies in controls and patients. You should use other programs like ANNOVAR and Ensembl VEP in order to annotated your variants for other things, like functionality predictors (CADD, FATHMM, PolyPhene2, etc.).

ADD REPLYlink written 12 months ago by Kevin Blighe63k

yes previously I asked about filter variants in controls and patients. I used samtool pipeline (on Linux), Now I want to ask you that Can we do snp analysis in R studio? and for that which package we should download ?

ADD REPLYlink written 11 months ago by rebecca0823810
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1594 users visited in the last hour