I am realigning my fastq reads against contigs generated from metaspades to see which reads mapped to which contig.
After running the following code to get mapped reads:
samtools view -h -F 4 in.bam > mapped.bam
I am confused by the file output and my next step. Here is a shortened example of the
mapped.bam file (please note I cut off the end of the file just to make things neat as I am only interested in understanding the first few columns):
A00977:183:HLLKYDSXY:3:1503:16658:5071 73 NODE_5156_length_78_cov_28 1 60 78M27S = 1 0 A00977:183:HLLKYDSXY:3:2178:28248:9142 369 NODE_5159_length_78_cov_5 31 0 48M98H NODE_691_length_1085_cov_115.969 17 0
So the first column is clearly the read name, followed by read length (?), then the reference sequence name. I am not sure what columns 4, 5, 6 represent. I also don't understand why in the 7th column, some have the NODE contig name appear but others do not - what exactly does that mean?
If my end goal is to determine which contig a read mapped to, I should be looking at the 3rd column or 7th column? I notice that when I grep a read name, sometimes the read appears multiple times leading me to believe a read has mapped to multiple contigs at a time (fine okay), but which column is giving me the actual contig name that it mapped to?
I just want to get this type of info to determine read depth for the contigs.