Question: Extract sequences which do not have blast hits
0
gravatar for karthic
10 months ago by
karthic100
karthic100 wrote:

Hi,

I have a fasta file with around 1 million sequences. I did a blast search and got hits for around 7500 sequences. Now I want to extract those sequences which do not have a hit and take them for further analysis.

So far am using a custom sed script which is very slow, judging from the speed, it might take several days to complete. Please help me with fast and robust solutions.

the script am using currently is below..

    cat CG061MR_S20_R1_001_AR_filter_un_ren.fa > CG061MR_S20_R1_001_AR_filter_unblasted.fa

    for j in $(cat CG061MR_blastids.txt)    
    do
    sed -i -e '/'$j'/{N;d}' CG061MR_S20_R1_001_AR_filter_unblasted.fa

done

Thank You KK

ADD COMMENTlink modified 10 months ago • written 10 months ago by karthic100
0
gravatar for karthic
10 months ago by
karthic100
karthic100 wrote:

Sorry guys for bothering.

Found the solution with Jim kent's faSomeRecords

Thank You

KK

ADD COMMENTlink modified 10 months ago • written 10 months ago by karthic100

faSomeRecords or GetFaRecords? karthic

ADD REPLYlink written 10 months ago by cpad011211k

Sorry, its faSomeRecords. Corrected it.

ADD REPLYlink written 10 months ago by karthic100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1566 users visited in the last hour