How to determine which read is mapped to the incorrect chromosome?
2
0
Entering edit mode
9.7 years ago
Austin • 0

I'm trying to cluster together discordant reads (paired-end), which includes pairs that are mapped to different chromosomes. One of the reads in the pair should be mapped to the correct chromosome with it's mate mapping to an incorrect chromosome. I need to determine which read in the pair is mapped to the incorrect chromosome. Is there any way to do this? Thank you for any insight on this.

next-gen sequencing alignment • 3.8k views
ADD COMMENT
0
Entering edit mode

How do you know that one of the reads is wrong. Genomic rearrangements do occur, after all.

ADD REPLY
0
Entering edit mode

Yes, I'm looking for newly inserted transposable elements (those which haven't occured in the reference genome). So some could be from genomic rearrangement, you're right, but if it's mapped to a different chromosome due to a transposition event then it would be incorrect because the new insertion isn't in the reference genome.

ADD REPLY
2
Entering edit mode
9.7 years ago
vlasova.av ▴ 20

You should look into 7th position in the sam/bam file - destination name will be written there in case when paired read mapped to another chromosome/scaffold/contig. When both reads are mapped to the same chromosome, this field have '=' symbol.

From documentation:

7. RNEXT: Reference sequence name of the NEXT segment in the template... This eld is set as `*' when the
information is unavailable, and set as `=' if RNEXT is identical RNAME. If not `=' and the next
segment in the template has one primary mapping (see also bit 0x100 in FLAG), this eld is
identical to RNAME of the next segment.

ADD COMMENT
0
Entering edit mode

He doesn't want to know on which chromosome the other read has been mapped to. He wants to find which read from the pair of reads that are located closely in the non-reference strain has moved to the other chromosome in reference genome.

ADD REPLY
0
Entering edit mode

Yes, exactly. I know how to extract the reads I'm interested in, it's just a matter of determining which one has moved.

ADD REPLY
0
Entering edit mode
9.7 years ago

It'll be easier if you know the approximate sequence of your transposable elements. For example, P elements in flies have a known structure, so you could use that to determine which read is inside the element and which is on the non-transposable portion of the chromosome. Of course, even if you don't know it then all isn't lost. An interesting route to take would be to take the raw reads from each pair that align to different chromosomes and then assemble them. It's likely that your best contig will be the edges of the transposable element.

ADD COMMENT
0
Entering edit mode

Unfortunately I don't know the approximate sequence of the transposable elements (although I can make a decent guess based on relative abundance of active TEs). I have been trying to do something similar to what you stated at the end. I have the read-pairs that map to different chromosomes and I am clustering similar reads together. I was planning on looking at the soft-clipped edges of the clusters to find the breakpoint of the transposable element? Or maybe even not limiting it to the clusters, but instead take the range of the cluster and look at all the reads in the original for soft-clipped regions there. What I'm hoping will happen is that clusters that mapped close to TEs will have a lot of soft-clips and the clusters that mapped to the wrong chromosome would not. Thank you for the help

ADD REPLY

Login before adding your answer.

Traffic: 2018 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6