Question: What sequence has being read by IGV's 'copy sequence'
0
gravatar for marongiu.luigi
3 months ago by
Germany, Mannheim, UMM
marongiu.luigi360 wrote:

Dear all

I have extracted reads NOT mapped to the human genome and re-aligned them to another genome called 'V' that DOES NOT CONTAIN bacterial sequences. I set some regions of interest, copied the sequence and BLAST it. For instance, for this region: enter image description here the BLAST result gives the top hits as:

Escherichia coli strain 2248 plasmid pNDM-2248 (coverage 100%, e-value  1e-56)  
Salmonella sp. strain Sa27 plasmid pSa27-TC-CIP (coverage 100%, e-value 1e-56)
Enterobacter hormaechei strain C15117 plasmid pSPRC-Echo1, (coverage 100%, e-value 1e-56).

May I ask if IGV is copying the sequence of the reads (as a consensus) or that of the reference genome? Since the reference does not have bacterial sequences, how could BLAST find bacteria instead? Would it be because the BLAST algorithm has missed the hit? Or the reads are not really mapped to their expected loci?

Thank you

ADD COMMENTlink modified 3 months ago by h.mon23k • written 3 months ago by marongiu.luigi360

Pure speculation. Genome V (since you wish to keep it secret) could have some contamination (or just a region that happens to be similar to a similar sequence in bacteria). If you omit bacteria what else does it hit via blast?

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax62k

Is no secret: V stands for viral. BLAST gave only bacterial species, but the reference is based only on virus sequences, hence there should be no bacterial hit in the first place. As you pointed out, there might be homology regions, but yet, I was expecting at least a hit on viruses.

ADD REPLYlink written 3 months ago by marongiu.luigi360
2
gravatar for h.mon
3 months ago by
h.mon23k
Brazil
h.mon23k wrote:

IGV is copying the reference genome. Clearly the "V" genome does contain plasmid DNA. What this "V" genome should be? It may be an assembly artifact, contaminants which weren't removed. For example, the difference between the Bos taurus genomes UMD 3.1 and UMD 3.1.1 is the removal of some bacterial contaminant contigs.

EDIT: if you select a longer stretch of the chromosome (at least the whole visible 780bp), you will see that it all blasts to bacteria, not only the part you selected.

ADD COMMENTlink modified 3 months ago • written 3 months ago by h.mon23k

How can I extract the sequence of the reads instead? Addendum: I selected 777 bp of the sequence, the BLAST result is still a list of bacterial plasmids but no virus hit.

ADD REPLYlink modified 3 months ago • written 3 months ago by marongiu.luigi360
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 905 users visited in the last hour