I am fairly new RNA-seq analysis and recently analyzed my first few datasets. To assign reads from my mapped bam file I used the featureCounts function of Rsubread in R which worked really well with >98% of my reads being assigned to features.
That being said I work in an organism with a genome that may have gaps in its annotation in particular for the type of features that I am interested in. What I am hoping to find out is if there is a way to output the reads that were not assigned to features by featureCounts so that I can take them and process them using another software to determine if there is particular enrichment in certain unannotated regions that may represent novel RNAs that have not been characterized.
Any advice on if this is possible and how to do it would be greatly appreciated. Alternatively if anyone has a suggestion on another way of doing this I would also certainly be open to that!
Thank you.
I would suggest using
filterreadsbyname.sh
from BBMap for extracting reads from your original data files. Should be plenty fast and no need for reads-groups etc.I will give this a try once I have the report reads output as well! Thank you!
Thank you very much for the response! I will give this a try. I had try doing something with reportReads option before, but not what you are recommending!
I was hoping to perhaps get some more advice here. I have generated a txt file containing the names of all reads that were not assigned to features, and the number corresponds perfectly with the number of Unassigned_NoFeature reads I get from featureCounts. However, after I run picard on the original bam file to get out these reads it brings pretty large number of additional reads that would not be in the txt file list.
When I run my new picard output bam file back through featureCounts I get the same number of Unassigned_NoFeature reads (suggesting that my list has all of these as expected), in the case of one example of mine approximately 500,000 but I also have around 3,400,000 reads that are assigned to features, which should not have been pulled out by picard.
The general outline of the picard command I used is:
Is there a possible reason for this? Perhaps an additional command I should be using here or something? Any additional advice would be greatly appreciated!
That is weird... A possibility is that you have additional alignments for some reads (supplementary alignments or secondary alignments), so one read ID corresponds to multiple sam/bam entries.
what is the output of:
I went back and used BBmap to pull out the reads from the Fastq files and then remapped it. Upon inspecting the mapping output and trying this with featureCounts I believe it likely was not an issue with picard but instead a setting on featureCounts and you were correct it was how multimapping reads were being interpreted I think.
This should essential resolve my issue! Thank you very much for your help!
I thought maybe something similar but was not certain how to check this. Based no the output I am not sure that is the case however.
Samtools flagstat on one of the output.bam files gives:
7614031 + 0 in total (QC-passed reads + QC-failed reads)
41579 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
7614031 + 0 mapped (100.00% : N/A)
7572452 + 0 paired in sequencing
3786226 + 0 read1
3786226 + 0 read2
7572452 + 0 properly paired (100.00% : N/A)
7572452 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Thank you!
Can you post output of
head -5 Unassigned_NoFeatures.txt
?At least with BBMap you need to remove the starting
@
in the fastq ID's in the list file. Not sure if that is whatpicard
needs as well.Yes certainly! The output is:
I had looked into BBmap and could only find the ability to input Fastq files. I am hoping to used the already mapped bam files as input. Does BBmap accept .bam as input rather than Fastq with the same settings, or perhaps is there a flag that I for some reason could not find for this?
Thank you again!
In case you are not able to make picard work then you can get the reads from your original data file and simply map them again. To extract the reads you can run the following
Great thank you! It is good to know there is an alternative option if picard does not work! I appreciate the help!
This worked well! Thank you very much for the help!