I am working on a few projects right now using 454 cDNA data. For these projects, we have made the sequence libraries ourselves in order that each (in fact, about 97%) of the sequences posses a tag that allows us to identify the individual from which it comes. We can have, let's say, 20 individuals, all tagged using 10 coded nucleotides that are placed at the beginning of the sequence by being part of the primers.
I then scan through the .fasta file using a Python program I made to change the names of the sequences to represent the tag found in the sequence (eg: Tag01) and remove any primers present. With these sequences, we do a de novo assembly, and then we re-align the sequences to the consensus contigs created in the de novo part. We then export the alignements in ACE format, which I parse using Biopython to FASTA format. Then, the fun begins :)
Using a list of SNP positions and contig numbers, I extract the genotypes from each sequence with another Python program I made. I'll then have to parse my result to do the rest of the job (stats and further analysis). I also do sequence counts for each of the tags in order to do gene expression (remember this is from cDNA) with these data.
My question is. How would you do the SNP individual genotyping? I really like the Python coding part, but I wonder if there is not an already made solution for just that.