Question: RNA-seq reads mapping to paralogs with high MAPQ. How to ensure this?
0
gravatar for biotech
3.1 years ago by
biotech500
United States
biotech500 wrote:

I've mapped 75 bp paired-end Illumina RNA-seq data to a bacterial draft genome. We are interested in a family of genes belonging to trimeric autotransporter family. As can be appreciated in the figure, it should be quite difficult to have reads mapping uniquely to one gene because all the genes have same domains. However, we have many reads mapping with MAPQ>20 to this domains. 

There could be some explanations to this; however, both of these cases should be flagged as having multiple mappings:

- Subtle SNP differences enough for BWA to discriminate these similar regions

- BWA acting randomly when assigning best match

 

I would like to know if it's safe to rely in BWA MAPQ>20 value as threshold. 

 

BWA code:

#BWA index
bwa index $genome_file
#BWA mapping
bwa mem -t 8 $genome_file $fastq_file_R1 $fastq_file_R2 | gzip -3 > P_S1_L001_aln-pe.sam_inverted.gz
bwa mem -t 8 $genome_file $fastq_file_R3 $fastq_file_R4 | gzip -3 > V_S1_L001_aln-pe.sam_inverted.gz

Here is the figure showing some proteins of the family, from Pina S. 2009 et al.: http://www.ncbi.nlm.nih.gov/pubmed/19011035

 

Thanks, Bernardo

 

paralogs rna-seq mapq bwa • 1.4k views
ADD COMMENTlink modified 3.1 years ago by Devon Ryan74k • written 3.1 years ago by biotech500
1
gravatar for Devon Ryan
3.1 years ago by
Devon Ryan74k
Freiburg, Germany
Devon Ryan74k wrote:

If BWA is assigning a MAPQ of 20 then there's enough sequence divergence for it to do so. In such cases, it would be completely inappropriate for BWA to call the alignments multimapping, since they're not (a read is only multimapping if the best and second best alignment have the same score...meaning the MAPQ is 0 (probably...this ends up being aligner-dependent)).

Edit: BTW, you can just pipe to samtools, which uses blocked gzip compression. That'll give you better integration downstream as well as better performance if you ever need random access within the file.

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Devon Ryan74k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1094 users visited in the last hour