5.9 years ago by
University Park, USA
The answer here depends a bit on what type of data you have, if had transcript sequences then you can create a blast database out of them, then user the
blastdbcmd command to extract your sequences while labeling them via lengths (see the
$ blastdbcmd -db ~/refs/16S/16SMicrobial -entry "all" -outfmt "%g %l" | head
Where the first number is the accession number the second is the lenght of the sequence. You could then match the accession and sort by length.
If you only have genomic coordinates one way to do this would be to extract your transcripts with a command like
bedtools getfasta if you have 12 column BED format or the
gffread command distributed with cufflinks if you have gff files.
Then do the blast database formatting and query as above.