Low sequence similarity between two samples of the same species
2
0
Entering edit mode
7.0 years ago
Mehmet ▴ 820

Dear All,

I would like to know your comments about an issue. I have two transcriptome assemblies (two samples) of a species. Conditions are the same for the two assemblies. But;

  1. Number of genes and transcripts are different. 2.When I cluster two assemblies by cd-hit and vsearch tools, I found % 35 similarities between two species ( I mean %35 of sequences in first transcriptome are found in second transcriptome, and protein clustering is also almost the same).
  2. When I map RNA reads of first sample to second transcriptome, I found %99 mapping ratio. When I map RNA reads of second sample to first transcriptome, I found ~ %98 mapping ratio.

What I want to learn from you is that why sequence clustering ratio is very low.

We believe that this two samples belong to a species ( gender may be different).

Thank you.

gene sequencing sequence Assembly alignment • 1.4k views
ADD COMMENT
0
Entering edit mode

Is it likely that both samples were processed through different assembly pipelines?

ADD REPLY
0
Entering edit mode

are you sure both samples are from the same species? Could there be contamination? Extract a 1kb section from a gene in both samples.... and run a blast search on them both. Do they both return the same species?

ADD REPLY
0
Entering edit mode

Based on COX1 gene sequencing, two species are the same. Besides, mapping ratio of RNA reads of each species to transcriptome of each species (cross mapping) is very high (over 95 %). These suggested us that these two species are the same. But clustering two transcriptome showed 35 % , meaning 35% of sequences of first species are found in transcriptome of second species. So I am confused.

ADD REPLY
0
Entering edit mode
7.0 years ago
Mehmet ▴ 820

No, the same process was applied for the two assemblies.

ADD COMMENT
0
Entering edit mode
7.0 years ago
h.mon 35k

I think at least part of this discrepancy may be explained by differential alternative splicing and/or contamination with introns between your samples. I know cdhit, with its default settings, will not cluster alternative transcripts which differ by an internal exon present at only one transcript, for example. However, mapping would be very high if using a local aligner, such as BWA. So I don't see 1) and 2) at odds with each other.

ADD COMMENT

Login before adding your answer.

Traffic: 2300 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6