Question: What is the drawback of mapping reads to a very fragmented genome reference?
1
gravatar for archie.w.lee
3 months ago by
archie.w.lee40
archie.w.lee40 wrote:

Dear All, What is the drawback of mapping reads to a very fragmented genome reference?The genome is about 200M, and about ~10,200 contigs. What is acceptable contigs number? Could you please give me some pointer? Thanks.

AL

alignment • 173 views
ADD COMMENTlink written 3 months ago by archie.w.lee40

One problem would be two reads mapping on two different contigs (which are supposed to follow each other in the original genome but could not be connected due to missing data/repetitiveness/etc.), which would cause the pair to be labeled as not mapping properly and could screw with statistics. Of course, with this kind of mismatching pairs you might actually obtain information to anchor the two contigs together.

Another problem with fragmented references is that you don't have any meaningful of topology, e.g. gene order etc.

In general, what you want is a low number of long contigs/scaffolds. Can you run e.g. QUAST to get some other descriptive statistics for your reference? 200M and 10k contigs means your contigs have an average length of 20kbp. The question is now how does the length distribution look like? It is also important that your reference covers enough of the gene space, otherwise, what can you do with it? You could run BUSCO to check which core genes are contained in the reference.

ADD REPLYlink written 3 months ago by cschu1811.2k

Thank you so much for your reply

ADD REPLYlink written 3 months ago by archie.w.lee40

None on the face of it. If that is what you have to work with. You should get alignments in any case. If there are duplicated regions (that have not been cleaned up) then you will get reads multi-mapping to those regions when in reality they may be from a unique region. If parts of genome are missing then those reads will not map. You may also see discordant alignments (if you have PE reads) when parts of the genome are in two contigs where they should be in one.

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax49k

Thank you so much for your reply

ADD REPLYlink written 3 months ago by archie.w.lee40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 708 users visited in the last hour