Heterogeneous Snp
2
0
Entering edit mode
9.3 years ago
polangxin ▴ 80

i'm using bwa to mapping NGS data to reference, and want to find position with heterogeneous data: for example:

seq1 274 T 23  AAAAAAACCCCCCC    7<7;<;<<<<<<<<<=<;<;<<6

seq1 272 T 24  ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<& seq1 273 T 23 ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+ seq1 274 T 23 AAAAAAACCCCCCC 7<7;<;<<<<<<<<<=<;<;<<6 seq1 275 A 23 ,$....,,.,.,...,,,.,...^l.  <+;9*<<<<<<<<<=<<:;<<<<


are there some software can do this?

Thinks!!

snp • 2.9k views
1
Entering edit mode

What are you asking? What is "heterogeneous data" in this context? Can you improve the formatting of your example, and maybe include the type of output you'd like to see (or some other explanation of the goal)?

1
Entering edit mode
9.3 years ago

I think you meant heterogeneous SNPs. You will have to use samtools to call for SNPs. The other tool is GATK unified genotyper. This link may help you: What is the best pipeline for human whole exome sequencing?

0
Entering edit mode

Thank you！ i'll read the post you sugest. lol

0
Entering edit mode
9.3 years ago
bioinfo ▴ 810

Call the SNPs first with samtools or GATK and get the vcf file. Then you can extract the heterogenous SNPs using vcftools or just simple grep function. e.g. grep '0/1' from the vcf file which will exract all the eterogenous SNPS.

1
Entering edit mode

You may need to be a bit careful about using 0/1 for heterozygotes. It is possible to have a more than two alleles present in some situations (more than one sample or if one sample has no reference allele). In those cases, you could get 0/2 or 1/2 or even 0/3, 1/3 or 2/3. These are unlikely but possible.

0
Entering edit mode

Thank you! where can i read about the explaination of 0/1, 0/2? thanks again!

0
Entering edit mode

You'll want to read the VCF format specification.

0
Entering edit mode

0
Entering edit mode

Thank you! i've tried, and it works great!