Count of mapped reads
1
0
Entering edit mode
3.9 years ago

Hello,

I have around 12,000 smallRNA reads (sequences) for which I need to find the total mapped reads for EACH individual sequence. They are in a collapsed fasta file ensuring the sequences appear only once in the entire file (trimming and quiality filtering etc have been already done ). I want to know the total number of reads ( counts) for each sequence, meaning if sequence A maps to multiple places- I will need the output for the total number of mapped reads for sequence A and so on.... Eventually I want to generate an expression profile for these sequences ( may be the top 50 ). I had previously used miRdeep2 tool and the quantifier.pl script for that suite was very helpful at that time. However these are not miRNAs and so miRdeep2 cannot be used here.

I can map these to the reference genome and generate a BAM file. But will samtools be of help in this situation ? I am a bit confused. Any help is most appreciated.

Thanks !

sequence • 1.3k views
0
Entering edit mode

12,000 smallRNA reads (sequences) for which I need to find the total mapped reads for EACH individual sequence

Your post is a bit hard to understand (for me). Are you asking to find out how many places do those smallRNA reads (each read = # of places) map to a reference genome?

This sounds like a normal count based workflow one would use (align (allow unlimited multi-mapping?) to a reference and then count using featureCounts).

0
Entering edit mode

Yes @genomax. you are correct. I want to know for every sequence how many different places it can map to and then report the total number of mapped hits for that sequence. This eventually translates into total number of mapped reads for that sequence.

0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

How long are these fasta sequences? If they are long (>500 bp), you may be able to use BBMap (mapPacBio.sh) suite to do your mapping (use ambig=all to get all possible locations the reads can map to, adjust other options to adjust alignment stringency as needed). Then you could count how many times a read appears in your BAM file to get those counts.

0
Entering edit mode

Oops- apologies for posting incorrectly. Thanks for the suggestion. The sequences in the fasta file are between 22-30 nucleotides in length. I am going to try out featureCounts and see how that goes.

Thanks !

0
Entering edit mode
3.9 years ago
GenoMax 99k

If the reads are that short then just use regular bbmap.sh. I don't think featureCounts will work in this case since it counts reads mapped to a feature (e.g. exon) in the genome. What you seem to want is a count of all locations where each of your original reads is going to map. Isn't that correct?

0
Entering edit mode

Okay thanks. Yes, that is what I want. Sorry if that was not clear from my writing. I will try BBMap then.