Question: Find gene duplications in a draft genome assembly
gravatar for niconm89
21 months ago by
niconm8910 wrote:

Hi everyone, I have recently assembly a draft genome. I consider it has an acceptable quality based on the assessment I made and my biological goals. Before I have the genome assembly ready, I performed a de novo transcriptome assembly of this species and, after some analysis, I found some candidate genes to be duplicated in the genome. My idea to check this by mapping the reads to these transcripts (candidates) and to the genome and the check the positions of those reads that have mapped both. I am a beginner in this area, so I would like to know your ideas and advice about it.

Thanks in advance.

ADD COMMENTlink written 21 months ago by niconm8910

Sounds like a valid approach indeed.

Could you nonetheless add some details how specifically you wan to do this? eg. which programs to use? param settings?

ADD REPLYlink written 21 months ago by lieven.sterck7.2k

Thanks for your reply.

I have both short and long reads but I think it would be better to use short ones in this case. So I would map reads to transcripts (candidates genes) with bowtie2, then filter the mapped reads and map them to the genome. After that, I will need to find primary and secondary alignments of each read (if exists) and compare the transcripts to these regions. I have not thought about param settings but I have lost the pairing information after the first map of reads to candidate genes (only one read of the pair map). I need to make some tests to define the params.

ADD REPLYlink written 21 months ago by niconm8910

Did you check for duplicated contigs in the genome assembly? Specifically, did you check if the candidate duplicated transcripts fall into truly unique contigs?

ADD REPLYlink written 20 months ago by h.mon29k

I do not think I have but is it possible to check it easily? Because I do not have too many contigs (~300) and are longer than the duplications I am looking for...I carried out a hybrid approach in the assembly so I would think that all the duplicated contigs were merged. What do you think? I tried to map the candidated transcripts to the genome contigs with GMAP, but I am not sure how to analyze the splicing alignments of each transcript. A duplication should be noticed by the primary and secondary aligments that would be similar, right? But in this case, I could (probably) have different alignments given the alternative splicing.

Thanks for your advice!

ADD REPLYlink written 20 months ago by niconm8910
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1153 users visited in the last hour