more than one sample in the same fasta file
0
0
Entering edit mode
7 days ago

Hello

A colleague asked me to align some miRNA-seq data for him. He handed only two files, an F and a R fasta files and said 6 different samples were there. So far, I have only dealt with rna-seq files where samples were each on a different file. Is there anything in the structure of the file I can use to break it and separate into multiple samples?

mirnaseq usegalaxy • 205 views
0
Entering edit mode

what is the content of those file ? can you see the names of the samples in the headers ?

0
Entering edit mode

Well, here is an example of a header

@A00126:312:H35CVDSX5:4:1101:1398:1000 1:N:0:TAAGGC
<<sequence>>
+


However, my colleague told me he annotated them as A6-A11, so there is nothing in this header that makes sense to me

1
Entering edit mode

Your example above is NOT fasta sequence. This is still fastq (assuming you omitted 4th line when you pasted the example).

TAAGGC is the index that the sample was labeled with (and the read came from it). If you don't know what that corresponds to then you could split the reads based on these indexes and simply call them sample 1 .. sample N. If you find more than 6 indexes then there could be an issue.

Your colleague will need to figure out what each is. This will still allow you to align and work on the data.

0
Entering edit mode

Thanks! This is what I needed

0
Entering edit mode

Is there anything in the structure of the file I can use to break it and separate into multiple samples?

Do you see index sequences in the fasta headers (if they are simply converted from fastq to fasta headers and if there are no names like Pierre Lindenbaum asks)?