Entering edit mode
4 months ago
demoraesdiogo2017 ▴ 90
A colleague asked me to align some miRNA-seq data for him. He handed only two files, an F and a R fasta files and said 6 different samples were there. So far, I have only dealt with rna-seq files where samples were each on a different file. Is there anything in the structure of the file I can use to break it and separate into multiple samples?
what is the content of those file ? can you see the names of the samples in the headers ?
Well, here is an example of a header
However, my colleague told me he annotated them as A6-A11, so there is nothing in this header that makes sense to me
Your example above is NOT fasta sequence. This is still fastq (assuming you omitted 4th line when you pasted the example).
TAAGGCis the index that the sample was labeled with (and the read came from it). If you don't know what that corresponds to then you could split the reads based on these indexes and simply call them
sample 1 .. sample N. If you find more than 6 indexes then there could be an issue.
Your colleague will need to figure out what each is. This will still allow you to align and work on the data.
Thanks! This is what I needed
Do you see index sequences in the fasta headers (if they are simply converted from fastq to fasta headers and if there are no names like Pierre Lindenbaum asks)?