Entering edit mode
7.2 years ago
vassialk
▴
200
Is there any sample code with a tutorial on how to align a 1000 bp (gyrase) gene sequence against a database of them, in Python or Java? Need to write a code to align an input sequence against a database of known sequences of several classes (100 items of each class) and output reports with variants and analysis charts. Can Biopython help in this task or I should search and use the other libraries or switch to R? Thank you.
I'm having a hard time understanding the question... Can you try to be clearer about what your input sequence and database of sequences look like?
Input -- cut DNAGyrase gene from tuberculosis Illumina NGS, database --- relevant genes with a known resistance status, thanks
Just a few more clarifying questions:
What is your goal after aligning the sequences, and do you need to use Python? It sounds to me like you already have a sequence you have constructed from your NGS data and now need to do a large-scale multiple alignment, perhaps for distance metrics like phylogenetic trees? Is that correct? Or are you looking to do a read mapping, with something like a BWA aligner to a known set of reference sequences?
I need to find differences between the input and database sequences and generate a meaningful nice report, thanks
So what you want is a variant caller then? And "a meaningful, nice report" is still super vague
If you want SNPs, then yes a variant calling pipeline like this one would be appropriate. If you want straight differences (global pairwise differences), then you'll want something like one of the clustal programs instead. In any case, there are already tools out there that do both of these things, so you shouldn't spend your time recoding it in Python unless you need to do that for a project of some sort.
Thank you, I`ll try that thing, though in such a case prefer to write my code with good libraries to control the process
I would suggest at least starting with the published pipeline and then modifying it or creating your own if the output doesn't make sense for your questions.
Thank you, the only way is to try several ways and see the results, I need Python or Java code with Bio[Language] libraries.