how to remove sequences form a QUAL file using a list of IDs
1
0
Entering edit mode
8.7 years ago
shinken123 ▴ 150

Hi

I am working with an old sanger reads, and I have the fasta and qual files separated. I will like to remove some sequences from the qual file using a list of names of those sequences but I am having problems with this. Do you have a solution for this?

Thank you very much

sequencing • 1.7k views
ADD COMMENT
1
Entering edit mode
8.7 years ago

You can use filterbyname from BBMap like this:

filterbyname.sh in=file.fasta qfin=file.qual out=filtered.fasta qfout=filtered.qual names=names.txt retain=f

You can't process the qual files alone, though, only in conjunction with the sequence files. If you encounter any problems, please post the first two lines of your fasta, qual, and names files.

ADD COMMENT
0
Entering edit mode

Thanks for your answer. I want to remove sequences that are in my qual files but are not in my fasta files. I have the list of names of these sequences, and now I want to use it to remove that sequences for the qual files.

An example of my sequences are:

From my fasta file:

>zema223322s001f_A02_.b..abd
aagctcactatagggcgaatggagctcacgcggtggcggccgctctagaa
ctagtggatcccccgggctgcaggaattcgatcgacggacaagtgaacgc
agattcgttccccaacggctacgacgcaaagtaccaaacccaaaatgcag
gcgcgtcggtgcacgggcgcccacgtgggcctgcaccatccgcgccgcac
cgggtctggggcccgcctggactcaacggtcggatctctcgtccctatca
tgcaccgctccgctttcagacagcagttggttctcatcaaagctcgcaac
tacttcctcccctcccttccttcccttccggccgcctcttgtcctcctca
cctcccaaattgaaatcctcctcactccccgcctagggttccgcttcctc
ccccgtccgagcacctcggcggcagcggtggcagcctggaagcagtccca
tctcgtcgccctatacttccctgccgccacacgtctcagtttcttgttcg
gccgctctcctgccggtccgcccatcgcctgaggtaagcgccccgcctgc
agtcgtcctcactactactttttccatcaatctttgccctcgtagtgtgt
gggcctcgtcccctgctagggccctccggggtgaggtccggtagaggcca
cgcgcgaccgcgagcgagcggcctcctgcgaccagggcggagttggcgac
cggtcgatttctcaagctcagcgtagagcgaatgtccagccggtttcaga
tccaaccgcacggtgttttgctactgtaaatgagcgttgattgtttggtc
gatcgattggcctcgcatttcctcaaacctaatttccgggcagaaacatt
tctgaattcctagatagtcccatcttccaacacactgcggaacctagatg
gccaggcaagggtttgtgtggctcgggctggtgcatgtgcccactcctcg
cttctggatgtaagaatgaaggacattggcgctttcagtgcgcgtaagaa
tgtctcatcttgttatgtatgggtatatcacaacgtgggtgtggaatggt
tattatagataactaatgggacatggataggattttggttaggccaattc
catgcgttttaaaatggtgtgggggatattgaattgttnac

from my qual file:

>zema223322s001f_A02q_.b..abd
9 11 15 13 15 17 19 19 19 22 16 25 25 27 14 12 12 9
7 9 8 14 14 16 14 14 8 8 9 15 15 28 25 34 38 41 28
25 23 21 16 16 16 16 16 16 17 18 20 20 20 21 24 24
37 30 25 29 34 36 30 30 23 23 44 34 34 37 50 37 37
37 37 27 29 29 29 27 27 34 34 44 52 52 50 50 59 59
59 59 37 44 44 59 52 52 56 59 59 59 59 38 30 30 38
38 38 59 38 38 38 38 38 38 38 36 36 33 33 43 59 59
50 50 50 59 59 50 38 38 38 45 38 51 50 38 38 50 50
50 50 59 59 59 59 45 33 33 30 33 30 50 50 50 50 50
50 50 59 59 59 59 59 59 59 59 56 51 51 51 56 56 59
59 51 51 51 59 50 50 50 53 53 59 56 53 53 53 56 53
53 53 53 50 50 53 50 53 53 53 53 53 59 59 38 38 38
38 38 59 50 50 50 50 50 50 50 50 50 50 50 50 50 53
50 50 50 59 59 53 53 53 50 50 53 53 53 53 50 50 50
50 50 50 50 50 56 50 50 50 59 50 50 50 50 50 40 50
50 53 53 53 53 50 59 50 50 50 50 53 53 59 59 56 56
51 59 59 59 59 59 53 53 53 53 53 47 47 47 59 50 50
50 44 44 43 59 59 59 59 59 50 50 50 44 53 53 53 53
53 53 44 44 44 44 53 53 53 53 53 53 53 44 44 43 43
41 41 30 29 37 29 36 36 32 32 32 41 35 35 36 36 36
36 29 36 36 32 32 32 34 34 36 43 43 43 43 43 50 50
50 50 50 43 53 43 43 43 43 43 43 41 36 34 34 32 36
43 36 36 38 38 26 27 30 32 36 31 34 27 27 27 27 37
37 43 43 43 43 53 53 53 44 44 44 44 44 44 53 53 44
44 37 34 37 30 41 41 41 41 50 50 50 50 50 50 50 47
41 41 41 38 36 32 24 24 32 47 36 41 41 41 45 50 43
43 43 43 43 43 50 50 53 53 53 44 44 44 43 43 43 47
47 43 50 50 50 50 50 41 41 41 41 32 32 27 32 32 38
36 36 43 47 47 47 47 50 50 50 50 50 50 43 43 43 41
33 37 37 37 37 37 44 44 44 52 52 52 52 52 52 59 53
53 52 59 59 50 50 44 44 44 50 53 53 53 56 56 44 43
59 59 47 47 47 59 59 59 59 59 52 56 52 53 53 52 52
52 52 50 59 59 50 50 50 56 56 56 56 56 52 59 52 52
53 53 53 53 59 59 59 59 52 50 52 52 56 56 59 56 56
56 56 59 56 53 53 53 56 50 47 38 38 38 47 44 44 52
53 53 59 59 59 59 59 59 59 59 59 56 56 59 59 56 50
43 43 37 37 59 59 59 59 59 53 52 52 52 52 52 51 51
50 34 34 37 30 30 30 41 41 41 41 36 30 41 41 41 34
34 34 34 34 41 22 22 16 13 13 18 27 28 32 36 41 41
34 36 24 28 28 28 28 28 45 45 41 47 32 32 26 23 23
26 29 27 34 34 34 37 41 28 28 34 34 34 25 25 19 15
15 21 32 37 41 43 45 41 36 36 36 41 41 43 45 45 45
41 41 41 41 41 50 50 50 50 50 50 45 45 41 41 35 36
33 33 32 32 32 32 32 36 47 25 25 25 24 24 24 33 33
33 28 25 25 23 25 25 38 38 41 28 36 30 30 36 36 41
45 41 37 37 39 39 39 39 50 53 53 53 53 53 51 35 35
35 30 34 44 35 34 34 34 33 33 30 44 44 44 43 50 50
50 50 50 45 41 27 24 24 22 25 25 32 32 37 38 21 23
23 18 21 21 20 18 21 21 18 21 33 33 31 25 25 25 25
28 28 50 45 45 45 45 41 41 38 38 37 37 27 23 22 19
13 8 12 12 21 17 17 21 36 36 43 43 32 32 26 17 17 21
23 30 16 13 13 16 18 15 20 23 30 32 41 34 26 26 17
19 17 15 18 14 21 21 22 23 23 17 14 15 15 13 17 23
23 18 9 9 9 9 9 16 11 11 15 23 23 23 36 26 26 26 32
32 28 23 23 20 14 14 15 15 15 15 13 9 10 10 9 9 14
18 18 25 26 26 36 30 30 36 36 21 23 17 11 11 15 15
13 13 14 15 10 13 13 13 8 6 6 8 11 9 9 10 11 11 18
10 10 11 19 15 17 17 12 13 13 22 24 29 30 15 17 8 8
8 9 9 9 8 8 6 6 7 8 8 6 6 7 9 6 6 6 12 13 15 13 12
9 10 7 7 6 7 8 8 8 9 9 9 9 6 7 8 10 10 7 7 8 8 11 10
12 6 6 7 6 4 5 5 7 13 13 15 13 12 11 11 10 10 6 6 7
6 7 10 8 8 8 8 7 7 7 9 9 9 8 7 7 6 6 7 7 8 11 10 10
10 9 9 7 7 7 6 9 8 9 7 9 7 9 9 8 9 9 9 10 10 11 11
16 22 12 10 7 7 9 10 9 9 11 15 15 13 13 9 7 6 6 7 8
7 9 9 11 7 5 0 5 5

But of course, these are sequences that are in both files. Another problem that I have is that some names of the sequences in the qual file differ a little from the sequences of the fasta. However now I have the list of the names of the sequences that are in the qual file but not in the fasta file and I will like to use them to remove that data form the qual file. Any idea?

ADD REPLY
0
Entering edit mode

How did you end up with sequences in qual files but not fasta files? I suspect the simplest and safest approach here would be to start with the raw files and redo whatever processing step led them to diverge. But barring that, I imagine that even as we speak someone is writing an awk script that will work on qual files alone...

ADD REPLY

Login before adding your answer.

Traffic: 2973 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6