Entering edit mode
7.3 years ago
Assa Yeroslaviz
★
1.8k
Hi,
I am looking for a possibility to download the complete nt DB in 2bit format. I need it to run a BLAT search against my unmapped reads from a RNA-Seq experiment. I have a lot of unmapped reads and one way to try and identify the source of these high amount of reads was to try and run a BLAT search against the complete nt. BUT BLAT takes only the 2bit format as an input. I know I can convert the fastA into 2bit, but I was wondering if there a better way than to split the fastA file into subsets (the faToTwoBit script from UCSC can't handle files bigger than 4GB).
thanks in advance
Assa
You'll need to split the fasta file regardless of what you do, since the 2bit format itself can't handle more than 4GB (i.e., it doesn't matter what program you use). Maybe just use kraken or something like that instead.
oh! I didn't know that. thanks
If I want to run BLAT against the complete nt (let us just assume i want to do it), I have downloaded the complete nt and decompress it (134GB), Now I split it into 46 parts (each 3000MB). I am trying to run the BLAT via gfServer-gfClient.
I was thinking doing it like that:
But Do I need to run the gfServer command for each of the 2Bit files separately?
Is that the aim of this search? How many reads are in the "unmapped" pool?
If you are doing this on the command line why not do the blat directly without the client/server layer?
I have over 20 files of unmapped samples (each from mate1 and mate2 of paired-end RNAseq data. Each fastA file has several millions sequences. I thought it will be more efficient to run it as a server. I'll try to do it separately as it seems the last possible option. thanks
Generally taking a random sample of 20-30 reads and blasting should be sufficient to identify major genomes present (unless you are expecting metagenomic contamination or need to identify every read that is unmapped).