Hi wonderful people,
I have been analyzing data from a paper, where I have used GEM3Mapper to get SAM files. Now, I have to remove host reads.
There are 2 advices I got from biostars :
$ samtools view -buSh -f 4 x.sam | samtools fastq - | cat -> x.fastq which gave me the following file
@ERR3348930.1_NS500640:176:HTN2KBGX2:1:11101:19155:1059 ATCCAGACTAAAAAGAGTGCCAGTCTTTTTACCTGTTCGGTATTCAAAGATATTTATTGACCGCCCGTCTTCGCT + AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEA @ERR3348930.2_NS500640:176:HTN2KBGX2:1:11101:1717:1059 CATCCACGAAGGTATTGTCCTCCATGATGTGATCGAGATACACTCTGCGCACGTCCGCACGGTAATCCCGTTCCT + AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEE @ERR3348930.3_NS500640:176:HTN2KBGX2:1:11101:23954:1060 GAATTTTTCCTCGCATTCTTTATCGCCGGGATGCAAATCCGGGTGATATTTTTTGGCACAGGCTCTGTATGCCTT
the other one is:
$ samtools view -f 4 x.sam | cut -f1 > unmapped_x.fastq
this contains IDs of unmapped reads as below
ERR3348930.1_NS500640:176:HTN2KBGX2:1:11101:19155:1059 ERR3348930.2_NS500640:176:HTN2KBGX2:1:11101:1717:1059 ERR3348930.3_NS500640:176:HTN2KBGX2:1:11101:23954:1060 ERR3348930.4_NS500640:176:HTN2KBGX2:1:11101:25942:1062 ERR3348930.5_NS500640:176:HTN2KBGX2:1:11101:16438:1065
My question is - if host reads are removed, which file I should get for the next step (i.e. taxonomic profiling with Metaphlan2? Have I used the right command to remove the host read, if yes which one is correct from above?
I always find this extremely helpful. I would be grateful if anyone could kindly help. Thanks Mahjabin