Can de novo RNA-seq analysis inform us about gene(or genome) duplication ?
1
2
Entering edit mode
7.6 years ago
Farbod ★ 3.4k

Dear Biostars Friends, Hi ( I'm not native in English so, be ready for some possible language flaws).

Is there any way to find out if a gene is duplicated in a non model animal from its RNA-seq data ?

e.g : the SRGAP2 gene has 3 copies in human, I want to check out how many of them is exist in a non model fish (yes, it means there is no genome available and the zebrafish is a fare related) that I have its transcriptome de novo assemby created using Trinity software.

Thank you in advance.

gene gene duplication RNA-Seq • 1.8k views
ADD COMMENT
0
Entering edit mode

A general comment. Based on your many posts here if you are truly interested in this genome then why don't you do some additional WGS to take steps towards getting a real genome put together for this fish?

ADD REPLY
1
Entering edit mode

Hi my dear friend,

because it is expensive !

ADD REPLY
0
Entering edit mode

I never said anything about the cost :-)

But look at it this way. You can slice and dice the RNAseq data you have in hand only so many ways. At some point without additional real WGS it is not going to be possible to get a clear picture of what this genome actually looks like and what transcripts/genes are real.

ADD REPLY
1
Entering edit mode

Yes, You are 100% correct, but still the only barrier here for me is the COST :)

ADD REPLY
1
Entering edit mode

Your PI needs to at some point to decide how much time is being spent doing things the long/hard way when having a roughly assembled genome would make things easier. It's an expenditure of resources either way.

ADD REPLY
1
Entering edit mode

This is one of invoices I have recieved :

< de novo whole genome sequencing of fish> Sample #: 2 samples (male and female) Sample: Fish (1.9G genome size) Suggested depth: 50X; 95Gb/sample (1) Approach 1 – random fragmentation library construction Library Construction: Truseq PCR-free 350bp $100/sample; $200/2 samples Hiseq2000 100bp Paired End $3000/lane 3 lanes required Throughput: ~35G/lane; ~100Gb/3 lanes * 2 samples Sequencing price: $3000 6 lanes = $18,000 Subtotal: $18,200(USD)

(2) Approach 2 – Mate pair library construction Library Construction: 3kb, 5kb, 8kb Mate pair(MP) library $1000/each MP; $6000/6 MP for 2 samples Hiseq2000 100bp Paired End $3000/lane Targeted throughput: ~50G/each MP Hiseq2000 100bp Paired End setting $30,000/(50G*6MP=300G throughput) Subtotal: $36,000(USD)

The grand total: $54,200(USD)

ADD REPLY
1
Entering edit mode

Just wait a bit until some labs (including mine) have their PromethION up and running ;-)

Is your organism of any biological/ecological/industrial/... importance? That would help to raise some money to get it done...

ADD REPLY
1
Entering edit mode

Hi and thank you for your kind invitation,

I long to visit you in your Lab.

ADD REPLY
0
Entering edit mode

No need to visit. Just send the sample and back comes the sequence in a few days.

Something else may come along before PromethION becomes real.

ADD REPLY
1
Entering edit mode

Ha Ha Ha,

I was accepting Wouter suggestion as a unique post-doc opportunity !

and by the way, do you have any idea about the "TransDecoder, capture all the resulting proteins, non-redundify them, then re-cluster . . . " I have provided below ?

ADD REPLY
0
Entering edit mode

I think you are missing a link for that tool below.

ADD REPLY
0
Entering edit mode

Genia is also an interesting platform to keep an eye on. I'm not sure how far their development is, but the first PromethIONs are being shipped.

ADD REPLY
0
Entering edit mode

Ghe, I'm a PhD student so I'm not really in the position to invite people for a post doc!

ADD REPLY
0
Entering edit mode

Most likely obvious to you, but I would say it's better to say that the gene has 6 copies because humans are diploid :-)

ADD REPLY
1
Entering edit mode
7.6 years ago

No, you can't really judge copy number from RNAseq. What you can do is use it to determine the likely sequence in your fish and then do a Southern blot (or a newer variant).

ADD COMMENT
1
Entering edit mode

Dear Devon Ryan,

Hi and thank you for your help, I have found some related answers in the web and I want you to kindly have a look at them please and say what you think about them :

1- run TransDecoder, capture all the resulting proteins, non-redundify them (ie. cd-hit at high stringency), then re-cluster and examine potential paralog relationships.

2- using tools same as evopipes or homeoSplitter

3- (this is from Trinity group) My experience working with both recently and anciently duplicated genomes (char, salmon, sturgeon) is that trinity is often conservative, and it does a good job at teasing paralogs apart.

ADD REPLY
2
Entering edit mode

I wouldn't trust any method that used RNAseq, all such methods will be inherently flawed. If you want to measure a DNA-level change, then you must directly use DNA.

ADD REPLY

Login before adding your answer.

Traffic: 3923 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6