Question: How To Interpret The Flagstat Output After Mapping Reads Back To Contigs Of A De Novo Transcriptome Assembly
0
gravatar for lqdo2000
5.8 years ago by
lqdo200020
Australia
lqdo200020 wrote:

Hi !

I have some paired-end reads (FR) data that I processed as follow:

  1. trimming of the adaptors
  2. estimation of optimal k-mer length (KmerGenie)
  3. de novo assembly (Trinity) of the paired trimmed reads (default settings)
  4. mapping of paired trimmed reads to Trinity contigs
  5. statistics of mapping (samtools flagstat)

Questions:

  • At step 2) I obtained optimal k = 93 for forward reads and k = 17 for reverse. Is such a difference make sense?

  • At step 5) I obtained this output:

    38918266 + 0 in total (QC-passed reads + QC-failed reads)

    0 + 0 duplicates

    38561384 + 0 mapped (99.08%:-nan%)

    38918266 + 0 paired in sequencing

    19459133 + 0 read1

    19459133 + 0 read2

    0 + 0 properly paired (0.00%:-nan%)

    38561384 + 0 with itself and mate mapped

    0 + 0 singletons (0.00%:-nan%)

    2542804 + 0 with mate mapped to a different chr

    406696 + 0 with mate mapped to a different chr (mapQ>=5)

Does 0 + 0 properly paired (0.00%:-nan%) mean nothing mapped to the contigs?

I would greatly appreciate any advice or explanation !

Thanks !

trinity samtools • 3.3k views
ADD COMMENTlink modified 5.8 years ago by Istvan Albert ♦♦ 81k • written 5.8 years ago by lqdo200020
2
gravatar for Istvan Albert
5.8 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:
  1. If you are assembling transcriptomes then KmerGenie is not the a right tool to estimate kmer size. KmerGenie should be used to estimate kmers for reads that represent randomly sheared DNA.

  2. the "proper pair" term usually means that the read pairs are within a certain distance and a certain orientation (FR) relative to one another.

Since this is a transcriptome data I suspect that your data is strand specific and if it is then it may be in the RF orientation.

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by Istvan Albert ♦♦ 81k

Thanks Istvan !

  1. Ok, I didn't know for KmerGenie. I tried VevetK and VelvetOptimizer but they returned a value of 1 ! Is there anything else I could try to estimate the best k length?
  2. Yes, data are strand specific (sequenced with MiSeq). I assumed the read R1 set referred to forward and the read R2 referred to reverse (and thus FR). How can we know which orientation are reads R1 and R2?

I also analysed the .SAM file generated by BWA with the Trinity script SAM_nameSorted_to_uniq_count_stats.pl. I got 87% proper_pairs, 6.5% left_only, 6.5% right_only. I assume it is not too bad, but there is a huge difference between results from samtools and trinity script. So which one are correct do you think? Is there a way to know? Thanks again !

ADD REPLYlink modified 5.8 years ago by Istvan Albert ♦♦ 81k • written 5.8 years ago by lqdo200020

well if it is a strand specific RNA-seq then samtools will misinterpret your pairs. That's why the other script is provided.

There is no kmer size estimator for RNA-seq data that I know of.

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by Istvan Albert ♦♦ 81k

Alright, I see. Thank you very much for your help Istvan !

ADD REPLYlink written 5.8 years ago by lqdo200020
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 768 users visited in the last hour