Question: BWA: Why paired reads mapped to different chromosome?
1
gravatar for lghust2011
2.6 years ago by
lghust201190
lghust201190 wrote:

I use BWA-MEM to map reads to reference, the command line is below:

bwa mem -t 6 reference.fasta fq1.fastq fq2.fastq > result.sam

According to the source code, BWA will get read1 from the fq1.fastq, and get read1 from the fq2.fastq. By default, read1 and read2 are paired-end reads from the same DNA fragment. So, I think these two reads will mapped to the same chromosome, because any DNA fragment can't cross the chromosome. However, when I read the sam result of result.sam, there are many paired reads mapped to different chromosome! Why? According to my consideration, it is because of the structure variation or repeat sequence? Or any other reason? If I just want to call SNV and indel, may I remove these paired reads? Thanks in advance!

sequencing alignment genome • 2.4k views
ADD COMMENTlink written 2.6 years ago by lghust201190
4

Biologically it could be explained by translocation, but if you have a lot reads it might be more likely to be caused by something technically.

ADD REPLYlink written 2.6 years ago by Benn7.9k
5

Hi, other reasons are homologous/pseudo genes, or conserved domains

Best

Tristan

ADD REPLYlink written 2.6 years ago by Titus910
1

Another leading cause is artifacts in library prep.

ADD REPLYlink written 2.6 years ago by lh331k

Do you mean the read1 from fq1.fastq and read2 from fq2.fastq are not from a same DNA fragment?

ADD REPLYlink written 2.6 years ago by lghust201190
3

You might have fusion genes, like Philadelphia Chromosome. Normally, an aligner will align wherever it finds the best match. The best match will be decided by some sort of final score, the calculation of which could be tuned according to the experimental need by changing the parameters of aligner. However, it is not the task of the aligner to force biological interpretation of the results.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Santosh Anand5.0k
2

If your reads are not in identical order in R1/R2 files (i.e. if the files were scanned/trimmed individually) then you would get odd mapping like the one you are describing here.

ADD REPLYlink written 2.6 years ago by genomax75k

I got fq1.fastq and fq2.fastq from the sequencer directly. By default BWA will get read1 from fq1.fastq and get read2 at the same order from fq2.fastq. Is it possible that these two reads are not exactly from the same DNA fragment? Why?

ADD REPLYlink written 2.6 years ago by lghust201190

fusion gene event could cause that. are those cancer samples?

ADD REPLYlink written 2.6 years ago by TriS4.0k

Yes, it's a cancer sample. If it's a normal sample, paired-reads will mapped to different chromosome? I think repeat sequence is frequent..

ADD REPLYlink written 2.6 years ago by lghust201190

Are they mapped in proper pairs when you check the flags? If you have a lot of improper pairing, this suggests a technical issue.

Brent Wilson, PhD | Project Scientist | Cofactor Genomics 4044 Clayton Ave. | St. Louis, MO 63110 | tel. 314.531.4647 Catch the latest from Cofactor on our blog.

ADD REPLYlink written 2.6 years ago by brent_wilson100

How many reads aligned to different chromosomes in numbers and proportions to the whole set and to locations where it is present (say, in IGV for a few regions). Please describe your biological sample. What is your reference? What are the mapq scores for these reads?

ADD REPLYlink written 2.6 years ago by Petr Ponomarenko2.6k

The sample is from a patient and my reference is hs37d5.fa. Around 2% of paired-reads are mapped to different chromosome, however, most of their MAPQ is 0. So, may I remove these paired-reads? Will they influence the vcf result? My pipeline is bwa->samtools->picard->localrealign->BQSR->mutect2.

ADD REPLYlink written 2.6 years ago by lghust201190

Hi , I m working in a cancer panel with 22 genes (something like 90 amplicons ) and i have same result as you and i think you have a bigger target then me. You should looks at the regions with MAPQ 0 an see a low complexity sequences.

Best

ADD REPLYlink written 2.6 years ago by Titus910
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 901 users visited in the last hour