Hey guys. I was able to successfully generate my VCF file and I am confused as to how to go from here in analyzing my data. I was asked to identify variance in the open reading frame region using certain quality criteria. I haven't used the IGV before and I am needing help doing all these. Also, any other suggestion that would help is welcomed.
BTW Thanks for all your help in helping with my BAM to VCF.
Thank you jared. I am basically trying to identify variations within the open reading frame. I don't know if that makes sense. This isn't something I have done before. I am working with Dustmites and trying to see changes in the ORF only. sorry for my late response
If you're just looking for variations in DNA, my first suggestion should work for you. Get the positions of your exons from your reference genome and intersect with them to pull out variants that lie within the exons from your VCF file.
If you're looking for those that actually alter protein sequence, you can create a new consensus sequence with the variants identified above replacing the reference allele and translating that sequence into amino acids, then comparing which change between each sample and the reference genome.
OK. Sounds good. Let me find a way to extract the eons from the .fa reference genome.