How to extract assembled contigs that produced a good mapping with read?
1
0
Entering edit mode
9.2 years ago

Dear Biostar community,

I have mapped transriptome sequencing reads on assembled contigs for a quality control of assembly. I have a bam-file with aligned reads created wtih samtools How can I create a fasta file from contigs that passed the control?

Thank you in advance!

-Krista

next-gen Assembly RNA-Seq • 2.6k views
ADD COMMENT
1
Entering edit mode
9.2 years ago

Edit (initial answer discussed reads only):

to extract the reads you can filter with samtools view  using flags such as -q and -f then pipe it into samtools bam2fqto produce a fastq file from it.

To identify the well covered contigs you would need to compute the coverage of the contigs with say bedtools coverage or bedtools genomecov then select from the output the contigs that appear to have good coverage. Once you have the names of the contigs that you want to keep there are many ways to filter your contigs file, say use samtools faidx to extract the ones you want to keep

ADD COMMENT
0
Entering edit mode

Thanks Istvan! Now I just should define what is the good enough mapping quality for contigs, thus any idea to set values for -g and -f? If I use bedtools coverage, how do I know which contigs "appear" to have good coverage?

ADD REPLY

Login before adding your answer.

Traffic: 1974 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6