Hello,
I have mapped rnaSeq data (illumina) with bwa and produced a bam file (using the human genome 38 as reference).
I would like to filter the bam file for the aligned sequences with the higher number of reads.
I want to use bedtools to produce the counts and then filter out the most expressed genes, but I'm having troubles processing the BAM file, could anyone explain me how to do it in a simple way (i.e. how do I sort for bed)?
At the end I'm looking for the 3 most expressed genes (I would then need to annotate the aligned sequences to look for the genes of interest). Thus what would be the fastest way to filter out and annotate the sequences with the highest number of mapped reads?
Sorry for the newbie question.
edit: 1- I need to sort and add counts to the mapped file 2- extract the top 3 genes after adding annotations
edit2:
There may be something wrong here :/
+ 0 secondary
0 + 0 supplementary
0 + 0 duplicates
2 + 0 mapped (0.00% : N/A)
101118 + 0 paired in sequencing
50559 + 0 read1
50559 + 0 read2
2 + 0 properly paired (0.00% : N/A)
2 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
I'm not sure if I understood your question. Do you want to do the following steps?