Which minimap2 parameters to set while performing mapping to avoid such mismatches
1
0
Entering edit mode
8 months ago
Mo ▴ 40

Hi all,

Apologies that I don't have a lot of experience with this stuff, but It's been a long time since I am struggling with this issue.

I am aligning my RNA seq data of a heterologous gene library from nanopore to the reference database, and I am getting such mismatches.

enter image description here

I have verified my reference database, there is no problem with that, I think some of the genes are misaligning with others. I wanted your suggestions on what parameters can I set with the minimap2 aligning command so these mismatches disappear. Currently, my command line looks like this:

minimap2 -ax map-ont reference.fasta rna_seq.fastq > output.sam

After this, I also filter the alignments to keep only primary ones (FLAG 0 & 16) with a mapping quality of at least 10 (MAPQ10).

I would really appreciate your help. Thanks

minimap2 • 1.7k views
ADD COMMENT
1
Entering edit mode

I have sequenced the mRNA of a heterologous library expressed in human cells using nanopore. Then I mapped the reads from the fastq files to the reference database of the library using the minimap

What "heterologous" library is this? Are there more than one gene/region included? Are you aligning to a combined reference of human genome + this heterologous library? You can't align to the heterologous library alone, if the dataset is from entire genome. What is the aim of this experiment? To show that the heterologous library is being expressed without acquiring any mutations?

ADD REPLY
0
Entering edit mode

Hi,

Thanks for your reply.

I am trying to express ~200 synonymous variants of a non-human gene in human cells to see how codon usage influences gene expression. Synonymous variants encode the same protein but have different nucleotide composition. Thus calling it a heterologous gene library. The library is barcoded, each variant is associated with a unique barcode for identification. After expressing this library in human cells, I am trying to sequence the total mRNA using nanopore MinION. This experiment has nothing to do with human genes, thus I am only aligning the sequencing results with the synonymous variant database. But yes, whole mRNA is sequenced in the experiment and the data produced is a mix of human mRNA + synonymous variants mRNAs.

The aim of the experiment is to see the difference in the mRNA levels of each variant.

ADD REPLY
0
Entering edit mode

I do not think I can provide a proper parameter set of minimap2 to make those mismatches dissapear as it might not be the problem of minimap2. In my opinion, you should do some checks.

  1. what is your true sequences of templates used for sequence
  2. what is the real differences (if there are) between the true templates sequences(without any sequence error) and your reference sequence
  3. what is the type of your template sequences? DNA? cDNA? mRNA? pre-mRNA? ...?
  4. the error profile of the sequence platform
  5. do some error correction with the sequence machine data before feed to any aligner?
  6. do some PCR do check whether these sequences with mismatches really exists in sample?
  7. use some other sequence with lower error rate to sequence again?
  8. change to another aligner?
ADD REPLY
0
Entering edit mode

Why can't the mismatches be SNVs? Most of them look real to me. Of course, you can't be exactly sure unless you perform a variant calling with the RNA-seq data (its just an analysis that could be done in addition to the usual gene expression analysis,but, of course, not replacing the traditional whole genome sequencing data)

ADD REPLY
0
Entering edit mode

Hi, Thanks for your reply.

I am trying to express ~200 synonymous variants of a non-human gene in human cells to see how codon usage influences gene expression. Synonymous variants encode the same protein but have different nucleotide composition. On each codon first and last nucleotide is variable. I think some other synonymous variants are aligning to this particular variant, but I have so far not checked the sequence of mismatches to see if they are another variants or something else. I will check this.

Do you still think variant calling can be helpful? As I already know the sequences of each variant in library. Many thanks.

ADD REPLY
2
Entering edit mode
8 months ago
GenoMax 141k

This experiment has nothing to do with human genes, thus I am only aligning the sequencing results with the synonymous variant database. But yes, whole mRNA is sequenced in the experiment and the data produced is a mix of human mRNA + synonymous variants mRNAs.

Even if this has nothing to do with human genes have you checked to see if (parts of) heterologous sequence "cross-align" with sequences human genome. There may be regions of similarity.

Since the data is from a mix of human mRNA+other mRNA, you need to create a hybrid reference (human + heterologous seq) and then align the data to this full reference. Aligners don't know/care about the sequences per se. They will try their best to align a reads to a given reference (even though a read may have not originated from a particular sequence in the reference but has some sequence similarity). It is possible that some of the "issues" you are noticing above are reads that came from normal genome but are being forcibly aligned to the heterologous reference.

ADD COMMENT
1
Entering edit mode

Hi,

I have no words for what to say. I just followed your advice and magic has happened, all those mismatches are gone. I created a hybrid fasta file of human + heterologous library and aligned with RNA seq fastq using the minimap. There are ~1000 barcoded variants and I have checked each one of them, 99.99% of the variants are aligning perfectly, no mismatches now.

You see the first window is after following your advice, the second window is before.

enter image description here

I have been struggling with this for 4 months and now finally it is looking perfect. I have no words to thank you enough for this.

Thank you so much, man. Wishing you the best in life. :) Happy to be part of this community.

ADD REPLY
0
Entering edit mode

Excellent news. I am glad to hear that things are now working as expected. I moved my comment to an answer. You can accept it (green check mark) to provide closure to this thread.

ADD REPLY
0
Entering edit mode

Done, thank you so much.

ADD REPLY

Login before adding your answer.

Traffic: 1722 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6