Hi all,
So, I'm new in shotgun metagenomic analysis and I have a doubt...
I have some raw sequences data, with four files per experimental group, e.g.:
group1_EKDL220004419-1a-AK35663_HJVYNDSX3_L3_1.fq.gz - 362M
group1_EKDL220004419-1a-AK35663_HJVYNDSX3_L3_2.fq.gz - 401M
group1_EKDL220004419-1a-AK35663_HL222DSX3_L1_1.fq.gz - 1.3G
group1_EKDL220004419-1a-AK35663_HL222DSX3_L1_2.fq.gz - 1.4G
I can see that _1 and _2 are for paired-end reads, and those with _L1_ are bigger in size than _L3_... But the question is: what should I consider in my analysis? Only _L1_ sequences? Both?
Thank you in advance!
I would be hesitant to conclude that. A couple of reasons, file sizes in lane 3 are much smaller. This would only be possible if the pool loaded there was a different one or had a different concentration. Also the file names are not strictly in format that Illumina software will produce, so they seem to have been altered after the fact.
vini.drr : Are the read lengths identical in L1/L3 files?
Sorry for the long delay.
So, reads seem to have identical sizes, but there are some strange repetitions in L3 files.
And not so much on L1 files:
(tail outputs are similar).
These repetitive sequences are removed as "adapters" when I concatenate the files and process them with kneaddata. However, 80% of the reads end up not being aligned when I try to perform a taxonomic profile.
So, I think there must be something wrong here...
I would not worry about a few reads at the top of the file unless you think all reads from L3 are behaving oddly. You should try and investigate with whoever gave you this data to see if you can find out more.
Ok, thank you very much for your kind answer! I will do that. Best!