how to remove host reads from other microbe reads
1
0
Entering edit mode
2.4 years ago
Mahjabin • 0

Hi wonderful people,

I have been analyzing data from a paper, where I have used GEM3Mapper to get SAM files. Now, I have to remove host reads.

There are 2 advices I got from biostars :

$ samtools view -buSh -f 4 x.sam | samtools fastq - | cat -> x.fastq which gave me the following file

@ERR3348930.1_NS500640:176:HTN2KBGX2:1:11101:19155:1059
ATCCAGACTAAAAAGAGTGCCAGTCTTTTTACCTGTTCGGTATTCAAAGATATTTATTGACCGCCCGTCTTCGCT
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEA
@ERR3348930.2_NS500640:176:HTN2KBGX2:1:11101:1717:1059
CATCCACGAAGGTATTGTCCTCCATGATGTGATCGAGATACACTCTGCGCACGTCCGCACGGTAATCCCGTTCCT
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEE
@ERR3348930.3_NS500640:176:HTN2KBGX2:1:11101:23954:1060
GAATTTTTCCTCGCATTCTTTATCGCCGGGATGCAAATCCGGGTGATATTTTTTGGCACAGGCTCTGTATGCCTT

the other one is:

$ samtools view -f 4 x.sam | cut -f1 > unmapped_x.fastq

this contains IDs of unmapped reads as below

ERR3348930.1_NS500640:176:HTN2KBGX2:1:11101:19155:1059
ERR3348930.2_NS500640:176:HTN2KBGX2:1:11101:1717:1059
ERR3348930.3_NS500640:176:HTN2KBGX2:1:11101:23954:1060
ERR3348930.4_NS500640:176:HTN2KBGX2:1:11101:25942:1062
ERR3348930.5_NS500640:176:HTN2KBGX2:1:11101:16438:1065

My question is - if host reads are removed, which file I should get for the next step (i.e. taxonomic profiling with Metaphlan2? Have I used the right command to remove the host read, if yes which one is correct from above?

I always find this extremely helpful. I would be grateful if anyone could kindly help. Thanks Mahjabin

from data removal host microbiome • 1.2k views
ADD COMMENT
1
Entering edit mode

If you are in doubt, use fastqscreen against human reference. First file x.fastq will have high % reads against human reference and second file unmapped_x.fastq should have lesser % of reads against human reference.

Function $ samtools view -f 4 x.sam | cut -f1 > unmapped_x.fastq prints only headers as you posted, and is not a fastq file.

In the fist command, you don't need to convert sam to bam for converting bam to fastq. samtools fastq can take sam as input and you don't need cat there. For a sam file from a SE alignment, will be like this: samtools view -h -f 4 test.sam | samtools fastq > out.fastq

ADD REPLY
0
Entering edit mode

Thanks a lot both. very useful advice indeed.

ADD REPLY
2
Entering edit mode
2.4 years ago
GenoMax 141k

I assume you are aligning original data to the host genome so any reads that are unmapped would be ones you are interested in. Then it would be the file generated in second example. You can use filterbyname.sh from BBMap suite to extract those reads from the full fastq file.

ADD COMMENT

Login before adding your answer.

Traffic: 2010 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6