Question: How does one remove transposable elements/rRNA from long reads???
gravatar for zack.saud
5 weeks ago by
zack.saud10 wrote:

Hi all,

I have assembled a fungal genome, 38 mb in size, using Flye. I filtered my reads to around 100x, but looking at the assembly graph in bandage, there are 3 nodes (700, 1000, and 2000 bases in size) which have a coverage of over 2000x. Running these in blast, I find that 1 is a ribosomal gene, and the other two are known transposable elements. As I fear these TEs may be causing a misassembly (judging by the tangles in the assembly graph), I want is to remove the reads that are the size of the nodes with excess coverage, i.e. the reads up to 700, 1000 or 2000 bases in length, but retain larger reads which would be the chromosomal regions with the matching regions from which these TEs would have originated. Using this script: minimap2 -ax map-ont ContaminatingNode1.fasta Reads.fasta' | samtools fasta -n -f 4 - > NoContaminationreads.fasta' I seem to have also removed the long reads, as when I align the NoContaminationreads.fasta to the assembled genome, there are no reads that span the contig where these high coverages TEs should be. Is there any way to remove only the reads up to a certain size, but retain the larger reads which would probably be chromosomal. IE, I want to perform an assembly and see the region in which these sequences map to have around 100x and no more.

Many tanks in advance



sequencing assembly • 69 views
ADD COMMENTlink written 5 weeks ago by zack.saud10

Sounds like you got three good contig sequences that represent things you don't want from your data. Can you try and remove reads (or part of reads) that align to those sequences? You may have tried that already but that is not clear in your text.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax80k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1267 users visited in the last hour