Fair warning, I am fairly noob at dealing with nuclear NGS data.
Background: I am molecular anthropology grad student. A little over a year ago, I got back a Illumina HiSeq 2000 data from 90 mitochondrial-enriched libraries. Another grad student in the lab got back the same kind of data from 92 NRY-enriched libraries, with some sample overlap. I now have 171 bam files that I have aligned to Hg19.
I am now trying to see what I can do with the "junk" data (i.e., the autosomal + X data). I want to see if there is enough good data to do some variant calling. I have SNP data from 64 samples at ~330,000 rs ids (the data is from an old set of HumanCNV370-Quads from 2008, I don't have the genomic coordinates). (there is some overlap between the genotyped individuals and the sequence). I was wondering, if anyone can give me some help/advice/suggestions.
I need to convert the BAM files to VCFs, get rsIDs into the VCF files, filter the VCF files based on quality and type (I am only interested in SNPs, not indels, microsat variation, etc.), and then see how much overlap there is between the actual SNP data.
I have an idea of how to convert the BAM files to VCF files, but beyond that I am lost.