Question: Rescuing Orphaned Reads With Local Alignment Within A Radius Of Mapped Mate
3
gravatar for Abhi
9.3 years ago by
Abhi1.5k
United States
Abhi1.5k wrote:

PS: this message has also been cross posted on seqanswers. I just want to reach out to more bioinfo guys so thought of posting it here too.

Problem:

So we have a dataset of variable biological insert library as we are sequencing the 5' and 3' end of transcripts. As a result the distance between the mates( <--- --->) is dependent on the length of transcript. To map the reads initially I am first using Mosaik which i belv does a better job with variable insert mate pair data.

After mapping we still see 40% orphaned reads where one read maps and the other doesn't. Is there a way that I can do a local re-alignment for these orphaned reads and attempt to map the mate within a given radius of the mapped mate.

Anything already out there ?

Thanks! -Abhi

• 2.2k views
ADD COMMENTlink written 9.3 years ago by Abhi1.5k

what is your alignment rate if you turn off pairing?

ADD REPLYlink written 9.3 years ago by Jeremy Leipzig19k

@Jeremy : The alignment rate for read 1 and read 2 independently is > 80%. It is the pairing that is causing problems.

ADD REPLYlink written 9.3 years ago by Abhi1.5k

@All : Any way I can know through an email when any updated is posted for a question I am interested in. I current get an email but a day later which doesn't help.

ADD REPLYlink written 9.3 years ago by Abhi1.5k

your realignment should be your unpaired alignment. I would suggest loading the subset of read names whose mates are unmapped (samtools view -bf 0x0004 reads.bam_ and then using those to examine where the mates align naturally

ADD REPLYlink written 9.2 years ago by Jeremy Leipzig19k
2
gravatar for Sean Davis
9.3 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

If I understand correctly, you are sequencing transcripts? It that is the case, it is quite possible that the orphaned reads are due to the lack of alignment across intron-exon boundaries. If I recall, Mosaik is not designed to align RNA-seq reads. Have you considered using an RNA-seq aligner such as GSNAP or tophat?

ADD COMMENTlink written 9.3 years ago by Sean Davis26k

@Sean : Sorry I could not reply earlier. We are sequencing only 5' and 3' end of transcripts and not the full transcripts. Tophat doesn't work as after linker removes the reads are of variable length depending on where the linker is found.

ADD REPLYlink written 9.3 years ago by Abhi1.5k

GSNAP will happily work with any length reads.

ADD REPLYlink written 9.3 years ago by Sean Davis26k

The latest version of TopHat also works with varied read lengths.

ADD REPLYlink written 9.1 years ago by Gww2.7k

+1 for Sean's response. Keep in mind that for many organisms exon 1 is short, thus putting you in Sean's scenario, while the last exon is often long. Long and short are of course relevant but that relevance is also dependent on your read length.

ADD REPLYlink written 8.9 years ago by Larry_Parnell16k
1
gravatar for Martijn Vermaat
9.3 years ago by
Martijn Vermaat180 wrote:

See the answer by Sean Davis if you are mapping exons on a genomic reference.

For finding large deletions and insertions, or other types of translocations, you could try Pindel. Its pattern growth algorithm does exactly what you are looking for.

Coincidentally, we found that GSNAP also works fine on DNA to find large deletions.

ADD COMMENTlink written 9.3 years ago by Martijn Vermaat180
0
gravatar for Larry_Parnell
8.9 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

A common problem with gene expression studies and sequencing - ESTs, RNA-Seq, etc - is contaminating genomic DNA. Some mRNA preps are excellent and some are poor. There is always some amount of genomic DNA, or unspliced or incompletely spliced messages in the mix. These could be a (partial) source of the orphaned reads. So, you can see if any orphans align to a contiguous segment of the genome and if so, if any of that alignment falls within intron. In some cases a retained intron is a legitimate splice variant, but this is rare, and would not be expected without first seeing matches to the known gene models. Thus, too many orphans mapping to introns is likely a sign of issues with the mRNA prep.

ADD COMMENTlink written 8.9 years ago by Larry_Parnell16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1789 users visited in the last hour