Question: SNP visualization with nucleotide %
0
gravatar for skbrimer
4.4 years ago by
skbrimer620
United States
skbrimer620 wrote:

Good afternoon group, 

I'm looking for a way to graph the variability of a base call across the genome as a percentage of the coverage at the base. I'm hoping to use this visual to see if there are SNPs that happen across the genome at the same frequency to see if I can see different haplotypes for my virus. I would also like to use this visual to find areas of more variability and see if in those areas I can use one of the current viral reconstruction algorithms to do some target haplotyping. 

I'm know this should be able to be done. I have a good reference, I have a bam file with all my sorted reads. I'm just not sure how to put them together in a graph. I have been reading the documentation for Rsamtools and it looks like I can use it to read in the file. From there I figured I could write a loop script to tally all the of the observed nt at each bp in the reference to get the % of each nt at that position. Then I would like to make a line graph of all the nts that are not ref and against the % they show up it in. 

I'm just not sure how to use the ref and the aln.bam file together to get started, can someone point me in the correct direction? 

Sean

 

snp alignment R • 1.7k views
ADD COMMENTlink modified 4.4 years ago by Brian Bushnell17k • written 4.4 years ago by skbrimer620
0
gravatar for Brian Bushnell
4.4 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

You can visualize this nicely with IGV. It just requires a sorted, indexed bam file, and the reference fasta, as input, and it functions in a GUI.

ADD COMMENTlink modified 4 months ago by RamRS26k • written 4.4 years ago by Brian Bushnell17k

Hi Brian,

I have done this as well and you are correct it does make a nice picture and I guess that is exactly what I described in my question. However I do not know of a way to extract the information from IGV. My genome is small, viral, and haploid so I use freebayes to get the variants I can not make haplotype calls I just get a single call across the genome. So that is why I was trying to use R to see if I could make different consensuses with the different level SNPs counts.

Is this a good idea or a bad idea?

ADD REPLYlink modified 4 months ago by RamRS26k • written 4.4 years ago by skbrimer620

maybe tweak your freebayes parameters, try -C 2 -F 0.01

ADD REPLYlink written 4.4 years ago by apelin20470

Thanks for the advice, I have been playing with the parameters, but I guess I'm not understanding how to extract the individual haplotypes out.

https://goo.gl/photos/WbjkqfmiYrm4ofsH9

In the linked screenshot you can see it calls one of the SNPs but not the other. the ref is ACAC and the call by freebayes is GCAC but it should also have GTAC and I do not understand why it doesn't or how to extract that information out... other than manually (please no).

ADD REPLYlink modified 4 months ago by RamRS26k • written 4.4 years ago by skbrimer620

Maybe post the header of the freebayes generated VCF files, and show the line which has your call. It is odd behaviour.

ADD REPLYlink written 4.4 years ago by apelin20470

Sure thing!

ADD REPLYlink modified 4 months ago by RamRS26k • written 4.4 years ago by skbrimer620

Here is your problem. You set a QUAL > 10 filter. Lower frequency variants have a smaller qual value (beucase what is smaller in frequency has a higher chance to be due to chance).

ADD REPLYlink written 4.4 years ago by apelin20470

Thank you for the help, when I look at the data is not filtered at the same spot it still does not make the call. They both do (the filtered and non), in other areas of the genome so I know its working. I will try some lower frequency parameters to see if I can get it to show up.

What is the next step after this though. How to I create a list of possible haplotypes from this vcf file?

ADD REPLYlink modified 4 months ago by RamRS26k • written 4.4 years ago by skbrimer620

You can't, you only have one call. ACAC and GCAC are your haplotypes.

ADD REPLYlink written 4.4 years ago by apelin20470
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1594 users visited in the last hour