Question

Help with extracting multiple sequences from a fasta file with a list of Ids (four counting per line)

0

Entering edit mode

6.4 years ago

ahmed_bio82 ▴ 10

I would like to extract multiple sequences from a fasta file with a list of counting ids (four counting per line). I found several scripts to extract sequences from fasta file based on a list of counting ids but with one counting per line. In my list of counting ids I have four counting per line. This is the header of my list of itd list

OG1.5_9691: aco|TRINITY_DN39707_c3_g4_i1.p1 bio|GFMW01138197.1.p1 lym|FX192122.1.p1 physa|Contig31631.p1
OG1.5_9693: aco|TRINITY_DN34744_c0_g1_i2.p1 bio|GFMW01140870.1.p1 lym|FX194372.1.p1 physa|Contig299.p1
OG1.5_9694: aco|TRINITY_DN40605_c7_g1_i1.p1 bio|GFMW01145544.1.p1 lym|FX194851.1.p1 physa|Contig70050.p1
OG1.5_9695: aco|Contig7627.p1 bio|GFMW01145616.1.p1 lym|FX202590.1.p1 physa|Contig22503.p1

I would really approciate any help you can provide to extract my sequences from the fasta file.

RNA-Seq sequencing python perl Assembly • 1.3k views

ADD COMMENT • link updated 6.4 years ago by Pierre Lindenbaum 166k • written 6.4 years ago by ahmed_bio82 ▴ 10

0

Entering edit mode

I have edited the question for you this time, but for future reference, this is a Question not a Tutorial.

Its not clear to me what you mean by extracting IDs by 'counting'.

Can you show your input data? It looks like you've only shown us one of the 2 files.

ADD REPLY • link 6.4 years ago by Joe 22k

0

Entering edit mode

faSomeRecords utility from Jim Kent should extract the sequences as long as the fasta header exactly matches in both files. Linux version linked. Remember to chmod a+x faSomeRecords after you download before executing. I assume 4 counting means there are 4 identifiers separated by space in your headers?

ADD REPLY • link 6.4 years ago by GenoMax 152k