Question

Find gene duplications in a draft genome assembly

0

Entering edit mode

5.8 years ago

niconm89 ▴ 10

Hi everyone, I have recently assembly a draft genome. I consider it has an acceptable quality based on the assessment I made and my biological goals. Before I have the genome assembly ready, I performed a de novo transcriptome assembly of this species and, after some analysis, I found some candidate genes to be duplicated in the genome. My idea to check this by mapping the reads to these transcripts (candidates) and to the genome and the check the positions of those reads that have mapped both. I am a beginner in this area, so I would like to know your ideas and advice about it.

Thanks in advance.

Assembly mapping gene genome duplicate • 1.6k views

ADD COMMENT • link 5.8 years ago by niconm89 ▴ 10

0

Entering edit mode

Sounds like a valid approach indeed.

Could you nonetheless add some details how specifically you wan to do this? eg. which programs to use? param settings?

ADD REPLY • link 5.8 years ago by lieven.sterck 15k

1

Entering edit mode

Thanks for your reply.

I have both short and long reads but I think it would be better to use short ones in this case. So I would map reads to transcripts (candidates genes) with bowtie2, then filter the mapped reads and map them to the genome. After that, I will need to find primary and secondary alignments of each read (if exists) and compare the transcripts to these regions. I have not thought about param settings but I have lost the pairing information after the first map of reads to candidate genes (only one read of the pair map). I need to make some tests to define the params.

ADD REPLY • link 5.8 years ago by niconm89 ▴ 10

0

Entering edit mode

Did you check for duplicated contigs in the genome assembly? Specifically, did you check if the candidate duplicated transcripts fall into truly unique contigs?

ADD REPLY • link 5.8 years ago by h.mon 35k

0

Entering edit mode

I do not think I have but is it possible to check it easily? Because I do not have too many contigs (~300) and are longer than the duplications I am looking for...I carried out a hybrid approach in the assembly so I would think that all the duplicated contigs were merged. What do you think? I tried to map the candidated transcripts to the genome contigs with GMAP, but I am not sure how to analyze the splicing alignments of each transcript. A duplication should be noticed by the primary and secondary aligments that would be similar, right? But in this case, I could (probably) have different alignments given the alternative splicing.

Thanks for your advice!

ADD REPLY • link 5.8 years ago by niconm89 ▴ 10