Need help : gene with highest number of high quality SNPs in reference genome.
1
0
Entering edit mode
7.2 years ago
Varshney ▴ 20

Hello everyone, this is my problem:

I am very new in this analysis. I have done bowtie2 for my genome sequence data and it was successfully run and got the output file in SAM format, and i converted it into BAM and VCF format, I have ~5600 scaffolds in reference genome file. If i want to know that which scaffold sequence consist maximum no. of SNPs and also want to know that how can i map these SNPs with the gene sequences.

Simply i want to know the gene with highest number of high quality SNPs in reference genome.

Please help me out.

Thanks in Advance !!

SNP • 1.3k views
ADD COMMENT
0
Entering edit mode

Thank you for your reply,but can you please help me that how can i get exon.bed file. I am totally confused . :(

ADD REPLY
0
Entering edit mode

The dexseq_prepare_annotation.py takes a GTF file and gives coordinates of exons. Be careful that you dont have duplicates in your bed file.

ADD REPLY
0
Entering edit mode
7.2 years ago

If you have a bed/vcf file with SNP information,

cat SNP.bed

chr1    150 151
chr1    301 302
chr1    501 502
chr1    900 901
chr2    177 177
chr2    188 188

and Exon information:

cat exons.bed

chr1    100 200 gene1   .   .
chr1    300 800 gene1   .   .
chr2    100 200 gene2   .   .
chr2    800 1000    gene2   .   .

To get the no of SNPs overlapping each gene:

intersectBed -a SNP.bed -b exons.bed -wb | groupBy -g 1,7 -c 7 -o count | cut -f2,3

gene1   3
gene2   2

This should work with a VCF file as well.

intersectBed

ADD COMMENT

Login before adding your answer.

Traffic: 2555 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6