more than one sample in the same fasta file
0
0
Entering edit mode
14 months ago

Hello

A colleague asked me to align some miRNA-seq data for him. He handed only two files, an F and a R fasta files and said 6 different samples were there. So far, I have only dealt with rna-seq files where samples were each on a different file. Is there anything in the structure of the file I can use to break it and separate into multiple samples?

mirnaseq usegalaxy • 759 views
ADD COMMENT
0
Entering edit mode

what is the content of those file ? can you see the names of the samples in the headers ?

ADD REPLY
0
Entering edit mode

Well, here is an example of a header

@A00126:312:H35CVDSX5:4:1101:1398:1000 1:N:0:TAAGGC
<<sequence>>
+

However, my colleague told me he annotated them as A6-A11, so there is nothing in this header that makes sense to me

ADD REPLY
1
Entering edit mode

Your example above is NOT fasta sequence. This is still fastq (assuming you omitted 4th line when you pasted the example).

TAAGGC is the index that the sample was labeled with (and the read came from it). If you don't know what that corresponds to then you could split the reads based on these indexes and simply call them sample 1 .. sample N. If you find more than 6 indexes then there could be an issue.

Your colleague will need to figure out what each is. This will still allow you to align and work on the data.

ADD REPLY
0
Entering edit mode

Thanks! This is what I needed

ADD REPLY
0
Entering edit mode

Is there anything in the structure of the file I can use to break it and separate into multiple samples?

Do you see index sequences in the fasta headers (if they are simply converted from fastq to fasta headers and if there are no names like Pierre Lindenbaum asks)?

ADD REPLY

Login before adding your answer.

Traffic: 1711 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6