Question

Improperly paired sequences

0

Entering edit mode

4.1 years ago

life99945 ▴ 20

Hi!

I have a library of paired BAC end sequences, ~800bp each. Expected insertion length is 150kb. After mapping it to a reference genome I visualised results with Tablet program.

The issue is that almost all of my sequences said to be "improperly paired' but despite that Tablet shows an insert length.

Has anyone come across this?

Thank you.

Tablet Improperly paired sequences • 1.1k views

ADD COMMENT • link 4.1 years ago by life99945 ▴ 20

0

Entering edit mode

Which program did you use for doing the alignments? What format are your reads in, fasta? Are they in separate files?

Most NGS programs expect the insert size to be < 500 bp since that is the typical size of fragments that are sequenced. Long read alignment programs will expect the reads to be contiguous so they would likely not be able to correlate the fact that you have two sequences from two ends of the same fragment (BAC clone for the lack of better word). So improperly paired message may be an artifact in this specific case.

ADD REPLY • link 4.1 years ago by GenoMax 141k

0

Entering edit mode

Thank you for the answer!

If this is the case, then the problem of incorrectly paired sequences disappears. However I still can’t understand on which specific part of the genome the insert was aligned. I tried to visualise with ncbi sequence viewer and IGV and it shows nothing. Only Tablet shows my alingment. Maybe something wrong with indexing or with making database for bwa?

I used bwa-mem on 2 fasta files and I got sam file as outcome, from witch I got bam and bam.index files. Database file was made from fasta file of whole genome downloaded from ncbi.

ADD REPLY • link 4.1 years ago by life99945 ▴ 20

0

Entering edit mode

Using bwa mem with two files like that is not what the developer probably had in mind for a dataset like this. Why not try to align the reads individually so you know where they align. Then compare the two alignment files to identify locations on chromosomes. Since these are long reads your should get concordant (on same chr at expected distance apart) alignments. If these are sanger reads I don't think you have millions of them to look thorough.

If you look in the individual SAM files for two alignments, you will get the chr location and start of the alignment in column 3 and 4.

Note: Are these files in proper order of read fragments? This is not something aligners check since they assume that is the case.

File 1                File 2
BAC1_End1             BAC1_End2
BAC2_End1             BAC2_End2
BAC3_End1             BAC3_End2
BAC4_End1             BAC4_End2

ADD REPLY • link 4.1 years ago by GenoMax 141k

0

Entering edit mode

I just realized that Tablet divided my genome into chromosomes when I looked at 3-d column of the sam file and I can see location of the reads! Thank you!

p.s. I think order of read fragments is correct, otherwise bwa said that he can not find the second pair and interrupted the process.

ADD REPLY • link 4.1 years ago by life99945 ▴ 20