If it's paired-end you should have two files ( R1.fastq and R2.fastq). In the id of the read you should have an information about the pair. Here an example of one pair of reads; The information is at the end of the read in 1:N:0:28. The 1 represent read from R1.fastq. FYI, it's from a HiSeq 2000 run
As @NicoBxl indicated, there are normally two files, one for each pair, so the best bet is that this is single-end. However, you'll note that the ID in that example (the part prior to the space, not including the pairing info) matches in both cases. This ID contains the lane coordinates for the cluster the read belongs to, a paired read originates from the same cluster and would have the same coordinates.
So, if you really want to be sure, you could parse through the data to make sure all the IDs are unique; if you have two sequences per ID this may mean your sequence is paired but interleaved.
Also check for duplicated read ids, to verify that the two files were not interleaved. This is as simple as grepping for a few read ids:
or just cut out the read ids and count them this way:
I am using the second one and I'll let you know later, (why the sort flag -k ? And the first sort - after head -?). Moreover, if it is available only one file, can I assume that is single end? Thanks
there was a missing | now edited. the flag is perhaps not needed in this case, it is just habit to restrict it to the field that I am interested in rather than all fields.
Having one file seems to indicate a single end data. But if you have identical read ids it might be interleaved paired end data
Ah ok.. I misread your first comment. Perfect, so files can be interleaved. So I have to check if seq id before
#have duplicates, because paired end read have the same seqid, am I correct?
Yes, as Istvan suggest just grep one id, you should have one results if it's single-end. If you have two results, it's an interleaved paired-end file
Ah ok, I got it. So there is always a match between two reads (or conversely there are no unique reads with a given seq id) if they are paired end. Add a new answer or update the existing with all info if you can, so I can accept it for future references.
Yes if paired-end data, you will have the same id for one pair of reads.