Mapped more transcripts than there were in fastq file

Mapped more transcripts than there were in fastq file - HOW?

0

Entering edit mode

4.0 years ago

tanya_fiskur ▴ 70

Hello!

It is very weird. I mapped a long-read fastq file with high quality isoforms to a reference genome, and got more mapped reads in the resulting .sam file than there were in the initial fastq file. I used parameters recommended by Cupcake: “-ax splice -t 30 -uf --secondary=no -C5” (https://github.com/Magdoll/cDNA_Cupcake/wiki/Cupcake:-supporting-scripts-for-Iso-Seq-after-clustering-step) The fastq file contained 44695 transcripts (which I counted by grep wc , and this number looks reasonable), and the mapped .sam file contains 46920 transcripts. I triple checked it.

Did someone experience something like this?

minimap2 RNA-Seq • 829 views

ADD COMMENT • link updated 4.0 years ago by N15 ▴ 160 • written 4.0 years ago by tanya_fiskur ▴ 70

0

Entering edit mode

If the reads map at multiple locations you will get more 'mappings' in your sam file than there were in your fastq file

but this of course is only valid if you counted the number of mappings as such, not taking the input read IDs into account (if you filter and make unique the IDs it can never be more than the input fastq file)

ADD REPLY • link 4.0 years ago by lieven.sterck 15k

0

Entering edit mode

This happened to me!!! I didn't specify that a read could only map in one location.

ADD REPLY • link 4.0 years ago by N15 ▴ 160

Login before adding your answer.