Help with extracting multiple sequences from a fasta file with a list of Ids (four counting per line)
0
0
Entering edit mode
5.2 years ago
ahmed_bio82 ▴ 10

I would like to extract multiple sequences from a fasta file with a list of counting ids (four counting per line). I found several scripts to extract sequences from fasta file based on a list of counting ids but with one counting per line. In my list of counting ids I have four counting per line. This is the header of my list of itd list

OG1.5_9691: aco|TRINITY_DN39707_c3_g4_i1.p1 bio|GFMW01138197.1.p1 lym|FX192122.1.p1 physa|Contig31631.p1
OG1.5_9693: aco|TRINITY_DN34744_c0_g1_i2.p1 bio|GFMW01140870.1.p1 lym|FX194372.1.p1 physa|Contig299.p1
OG1.5_9694: aco|TRINITY_DN40605_c7_g1_i1.p1 bio|GFMW01145544.1.p1 lym|FX194851.1.p1 physa|Contig70050.p1
OG1.5_9695: aco|Contig7627.p1 bio|GFMW01145616.1.p1 lym|FX202590.1.p1 physa|Contig22503.p1

I would really approciate any help you can provide to extract my sequences from the fasta file.

RNA-Seq sequencing python perl Assembly • 1.1k views
ADD COMMENT
0
Entering edit mode

I have edited the question for you this time, but for future reference, this is a Question not a Tutorial.

Its not clear to me what you mean by extracting IDs by 'counting'.

Can you show your input data? It looks like you've only shown us one of the 2 files.

ADD REPLY
0
Entering edit mode

faSomeRecords utility from Jim Kent should extract the sequences as long as the fasta header exactly matches in both files. Linux version linked. Remember to chmod a+x faSomeRecords after you download before executing. I assume 4 counting means there are 4 identifiers separated by space in your headers?

ADD REPLY

Login before adding your answer.

Traffic: 3293 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6