I have an SRA file (paired end) with a total number of sequences equal to 4,177,734.
I need to map the reeds to the reference genome.
So what I did was the following:
1-converts SRA to fastq (one for each end)
2- I aligned the paired end reads separately using BWA,
bwa aln ref readset_R1.fq > readset_R1.sai
bwa aln ref readset_R2.fq > readset_R2.sai
3-then combined them
bwa sampe ref readset_R1.sai readset_R2.sai readset_clean_R1.fq readset_clean_R2.fq > readset_ref_bwa.sam
4- then I created the bam sorted file using the following commands
samtools view -S readset_ref_bwa.sam -b -o readset_ref_bwa.bam
samtools sort readset_ref_bwa.bam readset_ref_bwa.sorted.bam
samtools index readset_ref_bwa.sorted.bam
if I view the sorted bam file using tview, I would get around 48 reads sorted from longest to shortest. Sice I am new to this domain, is it logical that I got only 48 reads out of the original number of reads that was in the SRA file (4 million +) ?
I need to get a consensus out of the sorted bam file. I have read dozens of posts about this but couldnt figure it out.
Many thanks in advance