fasta file extraction
0
0
Entering edit mode
9 months ago
amitpande74 ▴ 20

Hi, I have a fasta file formatted like this, product of paired end:

test_MAPQ.fasta->chr10:141146-141296
test_MAPQ.fasta:atgctcccattccaaatgagagtaattggctaaaacaaaggggctacaggtcccatacaagtccaaaacccaacagggcagtcattaaatcttTTCTAAttttaatttttattttatttgaagttctggggtacatgttcaggatgtata
test_MAPQ.fasta->chr10:142926-143076
--
test_MAPQ.fasta->chr10:146793-146943
test_MAPQ.fasta:gccAAATCATTACTTTTGAAGAAATAGTTAACAATGATTATTTCTTTTTGAATGACaataaattttattaataagttaaacatatttatatgtaatgtaaattttttGTATcgggtgcagtggttcatgcccgttatcctagcactttgg
test_MAPQ.fasta->chr10:146870-147020
test_MAPQ.fasta:aaacatatttatatgtaatgtaaattttttGTATcgggtgcagtggttcatgcccgttatcctagcactttgggaggccaaggtgttaatattgcttgagcaggggagtttgagaccagcctgggaaacatggtgaaacctcatatctac

I want an output where only both the pairs are included in the final results.

test_MAPQ.fasta->chr10:146793-146943
test_MAPQ.fasta:gccAAATCATTACTTTTGAAGAAATAGTTAACAATGATTATTTCTTTTTGAATGACaataaattttattaataagttaaacatatttatatgtaatgtaaattttttGTATcgggtgcagtggttcatgcccgttatcctagcactttgg
test_MAPQ.fasta->chr10:146870-147020
test_MAPQ.fasta:aaacatatttatatgtaatgtaaattttttGTATcgggtgcagtggttcatgcccgttatcctagcactttgggaggccaaggtgttaatattgcttgagcaggggagtttgagaccagcctgggaaacatggtgaaacctcatatctac

and the rest are ignored (like the upper pairs ending in dashes). Is there a tool which can filter out the results ? Kindly help.

fasta sed awk • 254 views
ADD COMMENT
0
Entering edit mode

what is the basis of pairing from OP example? Number of lines before -- ? amitpande74.

$ awk -v RS="--" -v ORS="--" 'NF>3 {print}' test.fa

test_MAPQ.fasta->chr10:146793-146943
test_MAPQ.fasta:gccAAATCATTACTTTTGAAGAAATAGTTAACAATGATTATTTCTTTTTGAATGACaataaattttattaataagttaaacatatttatatgtaatgtaaattttttGTATcgggtgcagtggttcatgcccgttatcctagcactttgg
test_MAPQ.fasta->chr10:146870-147020
test_MAPQ.fasta:aaacatatttatatgtaatgtaaattttttGTATcgggtgcagtggttcatgcccgttatcctagcactttgggaggccaaggtgttaatattgcttgagcaggggagtttgagaccagcctgggaaacatggtgaaacctcatatctac

You can also do awk 'BEGIN{RS="--"} NF>3 {print} test.fa but it would create empty lines before and after the sequences.

ADD REPLY

Login before adding your answer.

Traffic: 1983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6