Hello,
I'm shamsur. I got this snp file of a goat breed. SNP file showes CHROM, POS, ID, REF, ALT, QUAL, FILTER AND INFO. Click the link 1 I have taken screenshot from my terminal as it was easy to open from terminal. I want to make a Manhattan plot using this snp file to visualize significant snp for gwas study. But Manhattan plot require p-value for plotting.
How I'm suppose to have this p-value?
Thanks in advance.
Based on the field names, your file fits the VCF (Variant Call Format) specification. Can you explain how this file was produced?
Yes, you need p-values in order to generate a Manhattan plot, and these p-values typically represent loci scatterred across the genome. Typically, we would generate such a plot using p-values from a comparison between groups, e.g., Asthmatic Adults versus Healthy Control Adults. However, as you have not explained anything about your experiment, we have no way to know where to look for these p-vales. They may be encoded in the VCF. Please paste some lines from the VCF.
Hello Kevin Blighe \
Some very common tools was used to produce this vcf file such as bwa, samtools, gatk, vcftools.
ARS1 goat was used as reference genome. Commands were at below, Please ignore prefix's of command.
I searched how to get p-values for snps and I came across "vcftools --hardy" command to get pvalues
while generating vcf file.
Are you familier with "vcftools --hardy" command.
Thank you
Shamsur
Hello Kevin Blighe \ Some very common tools was used to produce this vcf file such as bwa, samtools, gatk, vcftools. ARS1 goat was used as reference genome. Commands were at below, Please ignore prefix's of command.
bwa index ref.fa\ bwa aln ref.fa read1.fq > aln1.sai\ bwa aln ref.fa read2.fq > aln2.sai\ bwa sampe ref.fa aln1.sai aln2.sai read1.fq read2.fq > aln.sam
samtools view -bS -o aln.raw.bam aln.sam\ samtools sort aln.raw.bam aln.sort
java -jar MarkDuplicates.jar \ ASSUME_SORTED=TRUE \ REMOVE_DUPLICATES=TRUE \ VALIDATION_STRINGENCY=LENIENT \ INPUT=aln.sort.bam \ OUTPUT=aln.bam \ METRICS_FILE=aln.dupli
java -jar AddOrReplaceReadGroups.jar \ INPUT=aln.bam \ OUTPUT=aln.rg.bam \ SORT_ORDER=coordinate \ CREATE_INDEX=true \ RGID=Rice01 \ RGLB=Rice3k \ RGPL=Illumina \ RGPU=ATGGGC \ RGSM=Rice VALIDATION_STRINGENCY=SILENT
java -Xmx1g -jar GenomeAnalysisTK.jar \ -T HaplotypeCaller -R $genome -I $BAM \ -o $prefix.gatk.raw.vcf \ -nct $cpu \ --genotyping_mode DISCOVERY \ -stand_call_conf 30 \ -stand_emit_conf 10
java -Xmx1g -jar GenomeAnalysisTK.jar \ -T SelectVariants \ -R $genome \ -V $prefix.gatk.raw.vcf \ -selectType SNP \ -o $prefix.gatk.snp.raw.vcf
I searched how to get p-values for snps and I came across "vcftools --hardy" command to get pvalues while generating vcf file. Are you familier with "vcftools --hardy" command. Thank you Shamsur