A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for


Download 25.45 Kb.
NameA combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for
A typeDocumentation
manual-guide.com > manual > Documentation


Text S1. Methodological details

Genome sequencing

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-MDA products for Candidatus Poribacteria sp. WGA-4E and 4G. For Illumina sequencing, 0.3 kbp shotgun libraries were constructed for each SAG. Briefly, 3 µg MDA product was sheared in 100 µl using the Covaris E210 with the setting of 10% duty cycle, intensity 5, and 200 cycle per burst for 3 min per sample and the fragmented DNA was purified using QIAquick columns (Qiagen) according to the manufacturer's instructions. The sheared DNA was end-repaired and A-tailed according to the Illumina standard PE protocol and purified using the MinElute PCR Purification Kit (Qiagen) with a final elution in 12 µl of Buffer EB. After quantification using a Bioanalyzer DNA 1000 chip (Agilent), the fragments were ligated to the Illumina adaptors according to the Illumina standard PE protocol, followed by a purification step of the ligation product using AMPure SPRI beads. The Illumina libraries were quantified using a Bioanalyzer DNA High Sensitivity chip (Agilent) and 300 ng of DNA (in 6 ul) then underwent normalization using the Duplex-Specific Nuclease (DSN) Kit (Axxora) (Bogdanova et al 2009). For normalization, the dsDNA was denatured for 3 min at 98°C, following a hybridization step at 68°C for 5h and DSN treatment at 68°C for 20 min. The normalized libraries were amplified by PCR for 12 cycles, gel-purified and QC assessed on a Bioanalyzer DNA High Sensitivity chip (Agilent), and then sequenced using an Illumina GAIIx sequencer (run mode 2x76 bp). For 454 pyrosequencing, a 4 kbp paired-end library was constructed and sequenced for each SAG. All general aspects of and detailed protocols for library construction and sequencing can be found at the JGI website (http://www.jgi.doe.gov/). Sequencing yielded the following raw data sets: SAG 4E: 6.8 Gb Illumina sequence and 74.4 Mbp of 454 sequence (276672 reads), SAG 4G: 5.8 Gb Illumina sequence and 97.1 Mbp of 454 sequence (335757 reads).

For SAG 4C sequencing was conducted at LGC Genomics GmbH, Berlin, Germany using also a hybrid approach of Illumina and 454 pyroseqeuncing. A 3kb paired end and standard shotgun library were constructed and sequenced using 454 FLX Titanium technology. For Illumina sequencing a standard shotgun library (1x100bp) was constructed and sequenced using the Illumina HISeq2000 platform. This resulted in 2.3Gbp Illumina sequence and 153.6 Mbp of 454 sequence (481,505 reads).

The draft genomes of SAGs 3G and 4CII were generated at the JGI using Illumina technology. An Illumina Std shotgun library was constructed and sequenced using the Illumina HiSeq 2000 platform. Sequencing yielded raw data sets of 1.4 Gbp of Illumina sequence for SAG 3G and 0.8Gb of Illumina sequence for SAG 4CII. General aspects of library construction and sequencing performed at the JGI can be found at http://www.jgi.doe.gov.

Genome assembly

All raw Illumina sequence data was passed through DUK, a filtering program developed at JGI, which removes known Illumina sequencing and library preparation artifacts (http://duk.sourceforge.net/), using the following parameters -k 22 -s 1 -c 1. Specifically, all reads containing sequencing adapters, low complexity reads and reads containing short tandem repeats were removed. Artifact-filtered sequence data were then screened and trimmed according to the k–mers present in the dataset using kmernorm (http://sourceforge.net/projects/kmernorm/). High–depth k–mers, presumably derived from MDA amplification bias, cause problems in the assembly, especially if the k–mer depth varies in orders of magnitude for different regions of the genome. For the SAGs 3G and 4CII reads with high k–mer coverage (>30X average k-mer depth, k=31) were normalized to an average depth of 30X and reads with an average k-mer depth of less than 2X were removed. For SAGs 4C, 4E, and 4G we removed reads representing high-abundance k-mers (>32x k-mer coverage, k=31) and trimmed reads that contained unique k-mers. After filtering, 1.7M reads for 3G, 0.2M reads for 4CII, 5.1M for 4C, 3M for 4E, and 1.3M for 4G remained.

For SAGs 4E, 4G, and 4C assemblies were performed in the following steps: (1) filtered Illumina reads were assembled using Velvet version 1.1.02 (Zerbino and Birney, 2008). The VelvetOptimiser script (version 2.1.7) was used with default optimization functions (n50 for k-mer choice, total number of base pairs in large contigs for cov_cutoff optimization). (2) The Velvet contigs were used to simulate reads from long-insert libraries, which were used together with the filtered reads as input for Allpaths-LG (Gnerre et al., 2011) assembly. (3) Next, Allpaths contigs larger than 1kb were shredded into 1-kb pieces with 200bp overlaps. (4) Lastly, the Allpaths shreds and raw 454 pyrosequence reads were assembled using the 454 Newbler assembler version 2.5 (Roche/454 Life Sciences, Branford, CT, USA).

The following steps were performed for assembly of 3G and 4CII: (1) normalized Illumina reads were assembled using Velvet version 1.1.04 (Zerbino and Birney 2008). (2) 1–3 Kbp simulated paired end reads were created from Velvet contigs using wgsim (https://github.com/lh3/wgsim). (3) Normalized Illumina reads were assembled with simulated read pairs using Allpaths–LG (version r39750) (Gnerre et al 2011). Parameters for assembly steps were: 1) Velvet (velveth: 71 –shortPaired and velvetg: –very clean yes –export-Filtered yes –min contig_lgth 500 –scaffolding no –cov_cutoff 10). 2) wgsim ( –e 0 –1 100 –2 100 –r 0 –R 0 –X 0). 3) Allpaths–LG (PrepareAllpathsInputs: PHRED 64=1 PLOIDY=1 FRAG COVERAGE=125 JUMP COVERAGE=25 LONG JUMP COV=50, RunAllpathsLG: THREADS=8 RUN=std_shredpairs TARGETS=standard VAPI WARN ONLY=True OVERWRITE=True).

These approaches resulted in the following draft assemblies: SAG 3G: total assembly size of 5,627,474bp (304 contigs); SAG 4CII: total assembly size of 596,887bp (64 contigs); SAG 4C: total assembly size of 1,713,200bp (302 contigs); SAG 4E: total assembly size of 3,679,266bp (540 contigs); and SAG 4G: total assembly size of 1,443,813bp (296 contigs).

Genome annotation and SAG whole genome sequencing quality control

The five poribacterial SAGs sequence assemblies were complemented by an additional poribacterial SAG, which was previously sequenced and analyzed by (Siegl et al 2011), Candidatus Poribacteria WGA A3 (hereafter 3A). All following steps were conducted with the five newly sequenced SAGs and the assembly of SAG 3A, which can be accessed under Genbank accession number ADFK00000000.

Genes were identified using Prodigal (Hyatt et al 2010). The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database (nr), UniProt, TIGRFam, Pfam, KEGG, COG, and InterPro databases. The tRNAScan-SE tool (Hacker and Kaper 2000) was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA (Pruesse et al 2007). Other non–coding RNAs such as the RNA components of the protein secretion complex and the RNase P were identified by searching genomes for the corresponding Rfam profiles using INFERNAL (Makarova et al 1999). Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG) (Markowitz et al 2008) platform (particularly IMG/mer) developed by the Joint Genome Institute, Walnut Creek, CA, USA (http://img.jgi.doe.gov).

All genome sequences were quality checked automatically by mapping against known contaminants, as well as manually using several tools in the IMG/mer system, such as tetranucleotide frequency analysis, phylogenetic distribution of genes and GC content distribution. We generally followed a conservative approach and removed all contigs that appeared as contamination in one of the screenings. A detailed description of the contamination screening process in IMG/mer can be found at http://img.jgi.doe.gov/mer/doc/SingleCellDataDecontamination.pdf. An additional quality screen independent of the IMG system was conducted by phylogenetic assignment of all genes using blastx against the NCBI nonredundant database and MEGAN (Huson et al 2007). This additional approach enabled us to detect contaminating sequences from sources not included in the IMG system at the time, such as mitochondrial DNA from the sponge host.

Contamination originated largely from previously identified contaminants of the WGA reaction kit (Blainey and Quake 2011, Woyke et al 2011) and mitochondrial DNA of the sponge host (Table S1). However, in SAGs 3A and 4G we detected contamination from additional sources and the amount of non-poribacterial DNA in these two datasets was larger than in the other SAGs. Thus, we excluded all reads that lacked genes with significant homologies (≥60 % ID) to any of the other cleaned poribacterial SAG genes. Since a larger proportion of the previously published dataset 3A was contaminated (Siegl et al 2011) we updated the original genome sequence and deposited the updated version at DDBJ/EMBL/GenBank under the accession number ADFK00000000. The version described in this paper is version ADFK02000000. After contamination removal the final assembly sizes resulted in 0.41 Mbp, 5.44 Mbp, 1.63 Mbp, 0.54 Mbp, 3.65 Mbp, and 0.19 Mbp for SAGs 3A, 3G, 4C, 4CII, 4E, and 4G, respectively.

Gene prediction of all cleaned SAG annotations was evaluated and corrected (if necessary) using the GenePRIMP software (Pati et al 2010). Updated versions were resubmitted to IMG/mer replacing the previous submissions for functional analysis. Unless stated otherwise all functional analyses were conducted with tools in the IMG/mer software system.

Blainey PC, Quake SR (2011). Digital MDA for enumeration of total nucleic acid contamination. Nucleic Acids Res 39: e19.

Bogdanova E, Shagina I, Mudrik E, Ivanov I, Amon P, Vagner L et al (2009). DSN Depletion is a simple method to remove selected transcripts from cDNA populations. Mol Biotechnol 41: 247-253.

Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ et al (2011). High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci U S A 108: 1513-1518.

Hacker J, Kaper JB (2000). Pathogenicity islands and the evolution of microbes. Annu Rev Of Microbiol 54: 641–679.

Huson D, Auch A, Qi J, Schuster S (2007). MEGAN analysis of metagenomic data. Genome Res 17: 377 - 386.

Hyatt D, Chen G-L, LoCascio P, Land M, Larimer F, Hauser L (2010). Prodigal: Prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11: 119.

Makarova KS, Aravind L, Galperin MY, Grishin NV, Tatusov RL, Wolf YI et al (1999). Comparative genomics of the Archaea (Euryarchaeota): Evolution of conserved protein families, the stable core, and the variable shell. Genome Res 9: 608–628.

Markowitz VM, Ivanova NN, Szeto E, Palaniappan K, Chu K, Dalevi D et al (2008). IMG/M: a data management and analysis system for metagenomes. Nucleic Acids Res 36: D534-538.

Mingkun L (2011). kmernorm.

Mingkun L, Copeland A, Han J (2011). DUK.

Pati A, Ivanova NN, Mikhailova N, Ovchinnikova G, Hooper SD, Lykidis A et al (2010). GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes. Nat Methods 7: 455-457.

Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J et al (2007). SILVA: A comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Research 35: 7188–7196.

Siegl A, Kamke J, Hochmuth T, Piel J, Richter M, Liang C et al (2011). Single-cell genomics reveals the lifestyle of Poribacteria, a candidate phylum symbiotically associated with marine sponges. ISME J 5: 61-70.

Woyke T, Sczyrba A, Lee J, Rinke C, Tighe D, Clingenpeel S et al (2011). Decontamination of MDA Reagents for Single Cell Whole Genome Amplification. PLoS One 6: e26161.

Zerbino DR, Birney E (2008). Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18: 821-829.


Share in:

Related:

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for iconDo your experiments (Illumina Sequencing) and obtain the fastq files

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for iconAbstract Circulating tumor cells (ctcs) enter peripheral blood from...

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for iconCpg island methylation detection by the sequencing of bisulfite converted...

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for iconAbstract Customers are often better off if they can use a combination...

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for iconEndogenous Retinal Progenitor Cell Regeneration and Reported Retinitis...

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for iconCrude Cell Pellets: 10 cell factories of gmp qualified hek293 cells...

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for iconPrice for a single license of "scad office" software products version 21
«Design combinations of forces»+ «Reinforcement selection for reinforced concrete structural elements»

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for icon5. Indoor units combination 1 Indoor unit combination for M2oc-18hrdn1-M

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for iconMissile defense agency (mda) small business innovation research program (sbir)

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for icon112 ILL. 2d 223, 492 N. E. 2d 1327, 97 ILL. Dec. 454, 56 A. L. R. 4th 1191

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for iconSession of the Committee on Economic, Social and Cultural Rights...

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for icon1. Contractors use this transaction set to send a single Shipment...

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for icon1. Contractors use this transaction set to send a single Shipment...

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for icon1. Contractors use this transaction set to send a single Shipment...

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for iconGenome sequencing, assembly and annotation methods

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for iconUltra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for iconModified GenSolve dna extraction from fta cards for Illumina iSelect BeadChip Genotyping

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for iconProtocol for: Next generation sequencing of custom amplicons to improve...

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for iconTesting Activities Performed

A combination of Illumina and 454 shotgun sequencing was performed on the single cell re-mda products for iconGes-n-series Operators Manual ges-701N – ges-302n on Line: 700VA,...




manual


When copying material provide a link © 2017
contacts
manual-guide.com
search