Hi wonderful people,
I have been analyzing data from a paper, where I have used GEM3Mapper to get SAM files. Now, I have to remove host reads.
There are 2 advices I got from biostars :
$ samtools view -buSh -f 4 x.sam | samtools fastq - | cat -> x.fastq which gave me the following file
@ERR3348930.1_NS500640:176:HTN2KBGX2:1:11101:19155:1059
ATCCAGACTAAAAAGAGTGCCAGTCTTTTTACCTGTTCGGTATTCAAAGATATTTATTGACCGCCCGTCTTCGCT
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEA
@ERR3348930.2_NS500640:176:HTN2KBGX2:1:11101:1717:1059
CATCCACGAAGGTATTGTCCTCCATGATGTGATCGAGATACACTCTGCGCACGTCCGCACGGTAATCCCGTTCCT
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEE
@ERR3348930.3_NS500640:176:HTN2KBGX2:1:11101:23954:1060
GAATTTTTCCTCGCATTCTTTATCGCCGGGATGCAAATCCGGGTGATATTTTTTGGCACAGGCTCTGTATGCCTT
the other one is:
$ samtools view -f 4 x.sam | cut -f1 > unmapped_x.fastq
this contains IDs of unmapped reads as below
ERR3348930.1_NS500640:176:HTN2KBGX2:1:11101:19155:1059
ERR3348930.2_NS500640:176:HTN2KBGX2:1:11101:1717:1059
ERR3348930.3_NS500640:176:HTN2KBGX2:1:11101:23954:1060
ERR3348930.4_NS500640:176:HTN2KBGX2:1:11101:25942:1062
ERR3348930.5_NS500640:176:HTN2KBGX2:1:11101:16438:1065
My question is - if host reads are removed, which file I should get for the next step (i.e. taxonomic profiling with Metaphlan2? Have I used the right command to remove the host read, if yes which one is correct from above?
I always find this extremely helpful. I would be grateful if anyone could kindly help. Thanks Mahjabin
If you are in doubt, use
fastqscreen
against human reference. First filex.fastq
will have high % reads against human reference and second fileunmapped_x.fastq
should have lesser % of reads against human reference.Function
$ samtools view -f 4 x.sam | cut -f1 > unmapped_x.fastq
prints only headers as you posted, and is not a fastq file.In the fist command, you don't need to convert
sam
tobam
for converting bam to fastq. samtoolsfastq
can takesam
as input and you don't needcat
there. For a sam file from a SE alignment, will be like this:samtools view -h -f 4 test.sam | samtools fastq > out.fastq
Thanks a lot both. very useful advice indeed.