Entering edit mode
5.2 years ago
ahmed_bio82
▴
10
I would like to extract multiple sequences from a fasta file with a list of counting ids (four counting per line). I found several scripts to extract sequences from fasta file based on a list of counting ids but with one counting per line. In my list of counting ids I have four counting per line. This is the header of my list of itd list
OG1.5_9691: aco|TRINITY_DN39707_c3_g4_i1.p1 bio|GFMW01138197.1.p1 lym|FX192122.1.p1 physa|Contig31631.p1
OG1.5_9693: aco|TRINITY_DN34744_c0_g1_i2.p1 bio|GFMW01140870.1.p1 lym|FX194372.1.p1 physa|Contig299.p1
OG1.5_9694: aco|TRINITY_DN40605_c7_g1_i1.p1 bio|GFMW01145544.1.p1 lym|FX194851.1.p1 physa|Contig70050.p1
OG1.5_9695: aco|Contig7627.p1 bio|GFMW01145616.1.p1 lym|FX202590.1.p1 physa|Contig22503.p1
I would really approciate any help you can provide to extract my sequences from the fasta file.
I have edited the question for you this time, but for future reference, this is a
Question
not aTutorial
.Its not clear to me what you mean by extracting IDs by 'counting'.
Can you show your input data? It looks like you've only shown us one of the 2 files.
faSomeRecords utility from Jim Kent should extract the sequences as long as the fasta header exactly matches in both files. Linux version linked. Remember to
chmod a+x faSomeRecords
after you download before executing. I assume4 counting
means there are 4 identifiers separated by space in your headers?