Hello,
I'm trying to align small groups of RNA-Seq reads within a larger FASTQ file. I need a separate alignment for each group. I can separate the read groups with an ID number or something similar, but it's not feasible to separate the groups into their own files, since there are around 10^6. I only have one reference sequence I'm trying to align to, and there isn't any alternative splicing.
Is there something that I can use for this? I've looked at the manuals for EMBOSS and Clustal O, but they don't seem to have anything appropriate. BWA has the option "-R" for setting the read group id, but I think that's just for output.
Thanks in advance for any help.
You could use
filterbyname.sh
from BBMap, search with that name on this page to see usage OR faSomeRecords utility from Jim Kent. Both will allow you to pull out subsets of records from your large file on demand.I will try that, thank you so much!