Hi,
I may have a task soon where I need to find the location of a transgene in WGS data, and I have been doing some reading on different ways to approach this in papers as well as Biostars posts such as this post, as well as this one.
Now that I have some ideas on how to carry this out, I want to try working through some of these on some example data, but I'm unsure of the best way to proceed.
The papers I've read seem to have access to either their raw data, or nothing at all, so I'm wondering if I even need the sequence of their entire construct ( i.e. should I just look up the transgene they used and get the fasta for it? )? it seems people normally map back to their entire construct to see if any of the backbone had been taken up in the genome.
Ideally I was hoping I'd be able to find some raw sequencing data and also the sequence of an inserted transgene or construct so I could play around with the data before attempting it on 'real' data. I thought papers where they were trying to characterise a transgene would make sense as then I could compare my results to theirs but I'm kind of stuck now.
Any advice would be appreciated.
Thanks,
Liam
Thinking aloud now, but couldn't you take any WGS sample that is out there, and simply delete a certain gene or region from the reference genome that you align against. It should be a non-repetitive, mappable, probably rather short and well-covered regions. That way, you would know exactly where this artificial "transgene" belongs to allowing to test your script.