What sequence has being read by IGV's 'copy sequence'
1
0
Entering edit mode
5.5 years ago

Dear all

I have extracted reads NOT mapped to the human genome and re-aligned them to another genome called 'V' that DOES NOT CONTAIN bacterial sequences. I set some regions of interest, copied the sequence and BLAST it. For instance, for this region: enter image description here the BLAST result gives the top hits as:

Escherichia coli strain 2248 plasmid pNDM-2248 (coverage 100%, e-value  1e-56)  
Salmonella sp. strain Sa27 plasmid pSa27-TC-CIP (coverage 100%, e-value 1e-56)
Enterobacter hormaechei strain C15117 plasmid pSPRC-Echo1, (coverage 100%, e-value 1e-56).

May I ask if IGV is copying the sequence of the reads (as a consensus) or that of the reference genome? Since the reference does not have bacterial sequences, how could BLAST find bacteria instead? Would it be because the BLAST algorithm has missed the hit? Or the reads are not really mapped to their expected loci?

Thank you

alignment samtools igv read map coverage • 2.2k views
ADD COMMENT
0
Entering edit mode

Pure speculation. Genome V (since you wish to keep it secret) could have some contamination (or just a region that happens to be similar to a similar sequence in bacteria). If you omit bacteria what else does it hit via blast?

ADD REPLY
0
Entering edit mode

Is no secret: V stands for viral. BLAST gave only bacterial species, but the reference is based only on virus sequences, hence there should be no bacterial hit in the first place. As you pointed out, there might be homology regions, but yet, I was expecting at least a hit on viruses.

ADD REPLY
2
Entering edit mode
5.5 years ago
h.mon 35k

IGV is copying the reference genome. Clearly the "V" genome does contain plasmid DNA. What this "V" genome should be? It may be an assembly artifact, contaminants which weren't removed. For example, the difference between the Bos taurus genomes UMD 3.1 and UMD 3.1.1 is the removal of some bacterial contaminant contigs.

EDIT: if you select a longer stretch of the chromosome (at least the whole visible 780bp), you will see that it all blasts to bacteria, not only the part you selected.

ADD COMMENT
0
Entering edit mode

How can I extract the sequence of the reads instead? Addendum: I selected 777 bp of the sequence, the BLAST result is still a list of bacterial plasmids but no virus hit.

ADD REPLY

Login before adding your answer.

Traffic: 2922 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6