I am stuck at the stage where after aligning my single-end illumina reads using bwa mem to a reference file which consists of a single chromosome (I am aligning each time to a single chromosome due to time limits on my server), I cannot find complete tutorials that explain how to extract contigs/scaffolds from bam files in order to annotate. I've read similar posts here but they all seem to implement a de novo strategy (like velvetg which produces a nice file of contigs) or use annotation programs meant for mammalian genomes or bacteria - the species I work with are fish.
My code thus far (after running FastQC and trimmomatic to clean my fastq reads):
bwa index ref_chr1.fa bwa mem -B 2 -M -t 40 ref_chr1.fa sp1_trimmed.fastq > sp1_trimmed.sam samtools view -Sb -@ 40 sp1_trimmed.sam > sp1_trimmed.bam samtools sort -@ 40 sp1_trimmed.bam > sp1_trimmed_sorted.bam samtools index sp1_trimmed_sorted.bam samtools mpileup -s -a -f ref_chr1.fa sp1_trimmed_sorted.bam -o sp1_trimmed_sorted.mpileup
HERE is where I am stuck. Do I even need the mpileup step? Can I just use my sorted and indexed bam files to somehow create scaffolds (no idea how to do that).
I have my species reference file in .gff format as well, as I just want to obtain all the CDS from my species so I can run various tests on each gene which I need separate so I can create MSAs for each (after extracting the gene for about 30 species).
Please any help is much appreciated!