Hello everybody,
I have two fastq files (reverse and forward) with an average of 150pb per read from a bacterial comunity (Illumina HiSeq). With those files I've assembled with metaVelvet. In order to get more restrictive information about the coverage of some extracted genomes from a metagenome, I want to map the reads to the assembly but only those which are 95% or higher similar, i.g keep only the reads that have: 0.95 >= match / (mismatch + match). I've read this: https://zombieprocess.wordpress.com/2013/05/21/calculating-percent-identity-from-sam-files/ but since I have a lot of reads ( more than 30 millions) I was hopping that there was a more automatic way to do this.
I appreciate if you could help me.
E.O.
BBMap can do this with the flag "idfilter=0.95". It can also tag the output reads with their identity using the flag "idtag" so you can manually filter them later.