more than one sample in the same fasta file

0

Entering edit mode

14 months ago

demoraesdiogo2017 ▴ 100

Hello

A colleague asked me to align some miRNA-seq data for him. He handed only two files, an F and a R fasta files and said 6 different samples were there. So far, I have only dealt with rna-seq files where samples were each on a different file. Is there anything in the structure of the file I can use to break it and separate into multiple samples?

mirnaseq usegalaxy • 759 views

ADD COMMENT • link 14 months ago by demoraesdiogo2017 ▴ 100

0

Entering edit mode

what is the content of those file ? can you see the names of the samples in the headers ?

ADD REPLY • link 14 months ago by Pierre Lindenbaum 161k

0

Entering edit mode

Well, here is an example of a header

@A00126:312:H35CVDSX5:4:1101:1398:1000 1:N:0:TAAGGC
<<sequence>>
+

However, my colleague told me he annotated them as A6-A11, so there is nothing in this header that makes sense to me

ADD REPLY • link 14 months ago by demoraesdiogo2017 ▴ 100

1

Entering edit mode

Your example above is NOT fasta sequence. This is still fastq (assuming you omitted 4th line when you pasted the example).

TAAGGC is the index that the sample was labeled with (and the read came from it). If you don't know what that corresponds to then you could split the reads based on these indexes and simply call them sample 1 .. sample N. If you find more than 6 indexes then there could be an issue.

Your colleague will need to figure out what each is. This will still allow you to align and work on the data.

ADD REPLY • link 14 months ago by GenoMax 141k

0

Entering edit mode

Thanks! This is what I needed

ADD REPLY • link 14 months ago by demoraesdiogo2017 ▴ 100

0

Entering edit mode

Is there anything in the structure of the file I can use to break it and separate into multiple samples?

Do you see index sequences in the fasta headers (if they are simply converted from fastq to fasta headers and if there are no names like Pierre Lindenbaum asks)?

ADD REPLY • link 14 months ago by GenoMax 141k

Login before adding your answer.