selecting short regions with multiple SNPs to write to a fasta file using SNP data generated from short read data
0
0
Entering edit mode
8.4 years ago
meg.nat86 • 0

Hi,

I am trying to do an analysis that requires sequence data, and not SNPs directly (Extended Bayesian Skyline Plot). The input required for the program are multiple fasta files, each with an alignment (for multiple individuals) for a particular 'region'. However, each locus/ region represented in a fasta file should have at least 3, or more SNPs to be informative.

The data I have is ddRAD seq data. SNPs were called after reference aligning the reads to the genome (scaffold level) using samtools and freebayes.

Is there any way to identify regions/reads that have multiple SNPs on them, so that they can be selected for all individuals and a fasta file (of 80 bp) can be written for each such region?

Ideally, is there a way that one could get information on how many SNPs each of the reads (the 80 bp reads that have been mapped to the genome), then classify them as reads with 0 or 1 or 2 or 3 or 4 etc SNPs, and finally select the subset of reads/ regions that have say 4 SNPs, and get consensus reads (2 alleles per individual per 'locus')?

If anyone has any suggestions, I'd be very grateful!

Thanks,
Meghana

SNP • 2.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 1955 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6