Question: Count of mapped reads
0
gravatar for leo1985.arnab
9 months ago by
leo1985.arnab0 wrote:

Hello,

I have around 12,000 smallRNA reads (sequences) for which I need to find the total mapped reads for EACH individual sequence. They are in a collapsed fasta file ensuring the sequences appear only once in the entire file (trimming and quiality filtering etc have been already done ). I want to know the total number of reads ( counts) for each sequence, meaning if sequence A maps to multiple places- I will need the output for the total number of mapped reads for sequence A and so on.... Eventually I want to generate an expression profile for these sequences ( may be the top 50 ). I had previously used miRdeep2 tool and the quantifier.pl script for that suite was very helpful at that time. However these are not miRNAs and so miRdeep2 cannot be used here.

I can map these to the reference genome and generate a BAM file. But will samtools be of help in this situation ? I am a bit confused. Any help is most appreciated.

Thanks !

sequence • 389 views
ADD COMMENTlink modified 10 days ago by Biostar ♦♦ 20 • written 9 months ago by leo1985.arnab0

12,000 smallRNA reads (sequences) for which I need to find the total mapped reads for EACH individual sequence

Your post is a bit hard to understand (for me). Are you asking to find out how many places do those smallRNA reads (each read = # of places) map to a reference genome?

This sounds like a normal count based workflow one would use (align (allow unlimited multi-mapping?) to a reference and then count using featureCounts).

ADD REPLYlink modified 9 months ago • written 9 months ago by GenoMax42k

Yes @genomax. you are correct. I want to know for every sequence how many different places it can map to and then report the total number of mapped hits for that sequence. This eventually translates into total number of mapped reads for that sequence.

ADD REPLYlink written 9 months ago by leo1985.arnab0

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

How long are these fasta sequences? If they are long (>500 bp), you may be able to use BBMap (mapPacBio.sh) suite to do your mapping (use ambig=all to get all possible locations the reads can map to, adjust other options to adjust alignment stringency as needed). Then you could count how many times a read appears in your BAM file to get those counts.

ADD REPLYlink modified 9 months ago • written 9 months ago by GenoMax42k

Oops- apologies for posting incorrectly. Thanks for the suggestion. The sequences in the fasta file are between 22-30 nucleotides in length. I am going to try out featureCounts and see how that goes.

Thanks !

ADD REPLYlink written 9 months ago by leo1985.arnab0
0
gravatar for GenoMax
9 months ago by
GenoMax42k
United States
GenoMax42k wrote:

If the reads are that short then just use regular bbmap.sh. I don't think featureCounts will work in this case since it counts reads mapped to a feature (e.g. exon) in the genome. What you seem to want is a count of all locations where each of your original reads is going to map. Isn't that correct?

ADD COMMENTlink written 9 months ago by GenoMax42k

Okay thanks. Yes, that is what I want. Sorry if that was not clear from my writing. I will try BBMap then.

ADD REPLYlink written 9 months ago by leo1985.arnab0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 657 users visited in the last hour