Question

BWA: Why paired reads mapped to different chromosome?

2

Entering edit mode

6.9 years ago

lghust2011 ▴ 110

I use BWA-MEM to map reads to reference, the command line is below:

bwa mem -t 6 reference.fasta fq1.fastq fq2.fastq > result.sam

According to the source code, BWA will get read1 from the fq1.fastq, and get read1 from the fq2.fastq. By default, read1 and read2 are paired-end reads from the same DNA fragment. So, I think these two reads will mapped to the same chromosome, because any DNA fragment can't cross the chromosome. However, when I read the sam result of result.sam, there are many paired reads mapped to different chromosome! Why? According to my consideration, it is because of the structure variation or repeat sequence? Or any other reason? If I just want to call SNV and indel, may I remove these paired reads? Thanks in advance!

alignment genome sequencing • 6.5k views

ADD COMMENT • link 6.9 years ago by lghust2011 ▴ 110

4

Entering edit mode

Biologically it could be explained by translocation, but if you have a lot reads it might be more likely to be caused by something technically.

ADD REPLY • link 6.9 years ago by Benn 8.3k

5

Entering edit mode

Hi, other reasons are homologous/pseudo genes, or conserved domains

Best

Tristan

ADD REPLY • link 6.9 years ago by Titus ▴ 910

1

Entering edit mode

Another leading cause is artifacts in library prep.

ADD REPLY • link 6.9 years ago by lh3 33k

0

Entering edit mode

Do you mean the read1 from fq1.fastq and read2 from fq2.fastq are not from a same DNA fragment?

ADD REPLY • link 6.9 years ago by lghust2011 ▴ 110

3

Entering edit mode

You might have fusion genes, like Philadelphia Chromosome. Normally, an aligner will align wherever it finds the best match. The best match will be decided by some sort of final score, the calculation of which could be tuned according to the experimental need by changing the parameters of aligner. However, it is not the task of the aligner to force biological interpretation of the results.

ADD REPLY • link 6.9 years ago by Santosh Anand 5.7k

2

Entering edit mode

If your reads are not in identical order in R1/R2 files (i.e. if the files were scanned/trimmed individually) then you would get odd mapping like the one you are describing here.

ADD REPLY • link 6.9 years ago by GenoMax 141k

0

Entering edit mode

I got fq1.fastq and fq2.fastq from the sequencer directly. By default BWA will get read1 from fq1.fastq and get read2 at the same order from fq2.fastq. Is it possible that these two reads are not exactly from the same DNA fragment? Why?

ADD REPLY • link 6.9 years ago by lghust2011 ▴ 110

0

Entering edit mode

fusion gene event could cause that. are those cancer samples?

ADD REPLY • link 6.9 years ago by TriS ★ 4.7k

0

Entering edit mode

Yes, it's a cancer sample. If it's a normal sample, paired-reads will mapped to different chromosome? I think repeat sequence is frequent..

ADD REPLY • link 6.9 years ago by lghust2011 ▴ 110

0

Entering edit mode

Are they mapped in proper pairs when you check the flags? If you have a lot of improper pairing, this suggests a technical issue.

Brent Wilson, PhD | Project Scientist | Cofactor Genomics 4044 Clayton Ave. | St. Louis, MO 63110 | tel. 314.531.4647 Catch the latest from Cofactor on our blog.

ADD REPLY • link 6.9 years ago by brent_wilson ▴ 140

0

Entering edit mode

How many reads aligned to different chromosomes in numbers and proportions to the whole set and to locations where it is present (say, in IGV for a few regions). Please describe your biological sample. What is your reference? What are the mapq scores for these reads?

ADD REPLY • link 6.9 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

The sample is from a patient and my reference is hs37d5.fa. Around 2% of paired-reads are mapped to different chromosome, however, most of their MAPQ is 0. So, may I remove these paired-reads? Will they influence the vcf result? My pipeline is bwa->samtools->picard->localrealign->BQSR->mutect2.

ADD REPLY • link 6.9 years ago by lghust2011 ▴ 110

0

Entering edit mode

Hi , I m working in a cancer panel with 22 genes (something like 90 amplicons ) and i have same result as you and i think you have a bigger target then me. You should looks at the regions with MAPQ 0 an see a low complexity sequences.

Best

ADD REPLY • link 6.9 years ago by Titus ▴ 910