I have mapped transriptome sequencing reads on assembled contigs for a quality control of assembly. I have a bam-file with aligned reads created wtih samtools How can I create a fasta file from contigs that passed the control?
to extract the reads you can filter with samtools view using flags such as -q and -f then pipe it into samtools bam2fqto produce a fastq file from it.
To identify the well covered contigs you would need to compute the coverage of the contigs with say bedtools coverage or bedtools genomecov then select from the output the contigs that appear to have good coverage. Once you have the names of the contigs that you want to keep there are many ways to filter your contigs file, say use samtools faidx to extract the ones you want to keep
Thanks Istvan! Now I just should define what is the good enough mapping quality for contigs, thus any idea to set values for -g and -f? If I use bedtools coverage, how do I know which contigs "appear" to have good coverage?
Thanks Istvan! Now I just should define what is the good enough mapping quality for contigs, thus any idea to set values for
-g
and-f
? If I use bedtools coverage, how do I know which contigs "appear" to have good coverage?