L1 and L3 tags from shotgun metagenomic raw sequences
1
0
Entering edit mode
18 months ago
vini.drr • 0

Hi all,

So, I'm new in shotgun metagenomic analysis and I have a doubt...

I have some raw sequences data, with four files per experimental group, e.g.:

group1_EKDL220004419-1a-AK35663_HJVYNDSX3_L3_1.fq.gz - 362M

group1_EKDL220004419-1a-AK35663_HJVYNDSX3_L3_2.fq.gz - 401M

group1_EKDL220004419-1a-AK35663_HL222DSX3_L1_1.fq.gz - 1.3G

group1_EKDL220004419-1a-AK35663_HL222DSX3_L1_2.fq.gz - 1.4G

I can see that _1 and _2 are for paired-end reads, and those with _L1_ are bigger in size than _L3_... But the question is: what should I consider in my analysis? Only _L1_ sequences? Both?

Thank you in advance!

metagenomics shotgun metagenomic • 769 views
ADD COMMENT
1
Entering edit mode
18 months ago

L1 and L3 probably mean Illumina lanes - read up on this, and contact your seq provider if in doubt.

The last _1 and _2.fq.gz mean likely mean read 1 and read 2.

You should be able to combine the lanes for analysis with cat. Check each individually with fastqc first to be sure though.

ADD COMMENT
1
Entering edit mode

I would be hesitant to conclude that. A couple of reasons, file sizes in lane 3 are much smaller. This would only be possible if the pool loaded there was a different one or had a different concentration. Also the file names are not strictly in format that Illumina software will produce, so they seem to have been altered after the fact.

vini.drr : Are the read lengths identical in L1/L3 files?

ADD REPLY
0
Entering edit mode

Sorry for the long delay.

So, reads seem to have identical sizes, but there are some strange repetitions in L3 files.

$ head O_Vh_J22_1_EKDL220004419-1a-AK35663_HJVYNDSX3_L3_1.fq

@A00551:418:HJVYNDSX3:3:1101:15284:1063 1:N:0:CACTAGGTAC
CAGCTGTGCTGTTGGCTTGAACACCTTTTTGTGCAGCATCCTGCCGGGCATGGCTGCTGTCTGAATCAAAGCTCCAGGAAATTATCTCTCCGCTCTTTCCGTCAATTTCATATTCATATTCCCAGATCGGAAGAGCACACGTCTGAACTC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFF,FFF,FFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00551:418:HJVYNDSX3:3:1101:18285:1063 1:N:0:CACTAGGTAC
GATCGGAAGAGCACACGTCTGAACTCCAGTCACCACTAGGTACATCTCGTATGCCGTCTTCTGCTTGAATAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+
FF:FFFFFFFFFFFFFF:FF:FFFFF,:FFFFFFFF::FFF:FFFFFF:::FFFF:,FF,,,,,F:F:F,F,,,F,F:F,:::F:FFF::FFFFFF,:FFFFFF,FFFFF:FFFFFF:FFFFF::FFFFFFFFFFFFFFFFFFFFFFFFF
@A00551:418:HJVYNDSX3:3:1101:25283:1063 1:N:0:CACTAGGTAC
GATNGGAAGAGCACACGTCTGAACTCCAGTCACCACTAGGTACATCTCGTATGCCGTCTTCTGCTTGAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

And not so much on L1 files:

$ head O_Vh_J22_1_EKDL220004419-1a-AK35663_HL222DSX3_L1_1.fq                                     

@A00627:348:HL222DSX3:1:1101:13476:1000 1:N:0:CACTAGGTAC
CAGGTGTTGAACCAGCTGATCGTCTTGTACGAAGAATGTCTGGCGAAATATTTATAAAAGGAAAAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCACTAGGTACATCTCGTATGCCGTCTTCTGCTTGAAAATTGGGGGGGGGGG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFF:FFF:FF:FFFFFF,,,FFFFFFFFFF
@A00627:348:HL222DSX3:1:1101:14145:1000 1:N:0:CACTAGGTAC
ATAATGCTATTGTCTGGCAGTGCCGCCGTACTGCCGACTTGTGTGAAGCCTTGAAAAAGGATACGGAATTTACCTCGTATATCCAGGAACGGACCGGGCTGCTGGTGGATGCCTATTTAGATCGGAAGAGCACACGTCTGAACTCCAGTC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF:FFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFFF:FFFFFFFFFF:FFFFFF
@A00627:348:HL222DSX3:1:1101:18358:1000 1:N:0:CACTAGGTAC
AAAAAACTTGATAATTATGTAGAAAATGAAACCAATTATTGTTTTTTCAAGTCAGAAGTAAATAGTATAACAACAATCAATAATATGTCTGAATATAGCTATGATGTAGAAACGGAAAGTGATCAGATCGGAAGAGCACACTTCTGAACT

(tail outputs are similar).

These repetitive sequences are removed as "adapters" when I concatenate the files and process them with kneaddata. However, 80% of the reads end up not being aligned when I try to perform a taxonomic profile.

So, I think there must be something wrong here...

ADD REPLY
1
Entering edit mode

I would not worry about a few reads at the top of the file unless you think all reads from L3 are behaving oddly. You should try and investigate with whoever gave you this data to see if you can find out more.

ADD REPLY
0
Entering edit mode

Ok, thank you very much for your kind answer! I will do that. Best!

ADD REPLY

Login before adding your answer.

Traffic: 1993 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6