Question: supertranscripts (conceptual question)
0
gravatar for paraskevopou
9 months ago by
paraskevopou10
paraskevopou10 wrote:

Dear all, I have a more conceptual question. I have used trinity supertranscript pipeline for calling SNPs between 2 individuals reared under 2 conditions (4 samples/4 libraries/4 vcf files (always compared each of the samples to the refence)). Reference was build with Supertranscript method (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5543425/) which is applied on non-model organisms and although superTranscripts do not represent any true biological molecule, they provide a practical replacement for a reference genome. So, in my case the refersnce was build of the combined de novo assembly of my own data.

In the above paper it is stated

"Only heterozygous SNPs, which we defined as those with at least one read supporting the reference allele, were analysed. Reported homozygous SNPs were removed because they are likely to be false positives of the assembly or alignment. True homozygous SNPs should be assembled into the reference and are therefore not detectable. Note that this is a general limitation of using the same sample to create the reference and call variants and is not unique to the superTranscript method. However, homozygous variants could be detected for non-model organisms if multiple samples were available or if superTranscripts were constructed and called with respect to a control."

I suppose that heterozygous SNPs are those represented by GT:0/1 in the vcf files while homosygous are represented with 0/0. Excluding the homozygous SNPs which actually I am more interested on, because these are the ones that explain variation among my two individuals, the #SNPs is reduced to 1/10 for each file. And of course from those much more less are on shared genes and positions among my 4 vcf files. At the end I yielded only 10 SNPs that are shared between the 2 individuals. Also, if i got it correctly I get SNPs in loci where the one allele has the same polymorphism as the reference and the other allele has the alternative polymorphism which is different in individual 1 and 2. This sounds to me more like an allele specific expression, which is interesting however not what I am looking for. Keep in mind that I do not have a reference genome, only a reference transcriptome that does not come of my data (completely different treatments though).

Any suggestions about which pipeline may be adequate for this kind of data or how can I get out homozygous SNPs without high number of false positives would be really helpful.

Thanks! Sofia

snp rna-seq assembly • 449 views
ADD COMMENTlink modified 9 months ago • written 9 months ago by paraskevopou10

Hi,

Would it be possible to break up your post into paragraphs so it is easier to read? Thank you!

ADD REPLYlink written 9 months ago by RamRS19k

Thanks I edited a bit the text to be more easy to read

ADD REPLYlink written 9 months ago by paraskevopou10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1142 users visited in the last hour