Question: Working With Snp Data From Ngs & Exon Sequencing
gravatar for romsen
7.1 years ago by
romsen60 wrote:


I have access to SNP and genotype information from NGS data, particularly Exon-Seq reads of 56 samples. For every sample a variant file exists in two formats: (.gff3 and kind of tab delimited file). Data include e.g. rsID, pos, CHR, refAllele, quality Score...

This means 56 files each with more than 400.000 SNPs. I know several tools for SNP data processing (plink, imputation stuff) but have no idea how to use them for this kind of data. Perhaps you can help me and suggest some tools to create eg. ped/map files or generally one genotype file for 56 samples of selected or all SNPs.

Are there standardized tools, at all? Or one has to use R, Unix &Perl commands to cut, combine and work with such data?


ngs tools snp sequencing • 2.6k views
ADD COMMENTlink modified 7.1 years ago by Jorge Amigo12k • written 7.1 years ago by romsen60

Before getting into file conversion etc., What do you want to do with these data? Are these diseased individuals? Are you just trying to learn about NGS variant analyses? With a few more details you will get the answer you are looking for.

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by Zev.Kronenberg11k

precisely. first comes the "what?", and then the "how?". and not in the other way round.

ADD REPLYlink written 7.1 years ago by Jorge Amigo12k

Can you post a snippet of the tab delimited file so we can see how it's structured? It could be VCF (though you'd likely have noticed the header). BTW, you can convert GFF3 into VCF (see this thread: Converting a SNP GFF3 file to VCF format) and then convert that into .ped and .map with vcftools if nothing else.

ADD REPLYlink modified 21 months ago by RamRS30k • written 7.1 years ago by Devon Ryan97k
gravatar for Jorge Amigo
7.1 years ago by
Jorge Amigo12k
Santiago de Compostela, Spain
Jorge Amigo12k wrote:

I find Zev's comment very important, as it's not that rare to find people coming to you with NGS data, eyes wide open, even sweating by the challenge they're facing, and asking: "now, what?". the high-throughput genotyping field has already defined some very interesting approaches for extracting association and linkage knowledge from such amount of data, but the one of the most interesting strengths of NGS may the variant discovery capability, that allows us to work with really rare variants but in very large numbers. first you'll have to think what question do you want to ask to your data, and then you'll have to find out how to build that question on a computer. in fact, the question should have been defined before deciding going into NGS, but that'd be another story.

If you are talking about how to deal with .gff3 variants (are we talking about SOLiD LifeScope's?) the best suggestion I can think of is to annotate them (with ANNOVAR for instance), which will allow you to deal with them later as tabulated files with enriched information. and if you want to get knowledge from all those (and the newly generated ones) tabulated files at once, instead of extracting the information sample by sample yes, you will definitely need to create a tool to process them. if it's just for simple operations like combining, merging, overlapping,... scripting would do. if you want to go beyond that, extracting conclusions from statistical inferences, then you'll certainly would have to think about dirtying your hands with R.

ADD COMMENTlink modified 21 months ago by RamRS30k • written 7.1 years ago by Jorge Amigo12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1368 users visited in the last hour