Complete genome sequence of Sphingobium yanoikuyae strain CC4533¶

Mautusi Mitra¹ ORCID and Ana Stanescu² ORCID

¹ University of West Georgia, School of Field Investigations and Experimental Sciences
² University of West Georgia, School of Computing, Analytics, and Modeling, 1601 Maple Street, Carrollton, GA 30118, USA

Abstract¶

We have isolated a new strain of Sphingobium yanoikuyae, which belongs to the class Alphaproteobacteria, order Sphingomonadales, and family Sphingomonadaceae. This carotenoid-producing strain is capable of degrading xenobiotics and is tolerant to toxic levels of six heavy metals. We have designated the newly isolated strain of S. yanoikuyae as S. yanoikuyae strain CC4533 (hereafter called strain CC4533) because it was isolated from a contaminated Tris-Acetate-Phosphate (TAP) medium culture plate of a green micro-alga Chlamydomonas reinhardtii wild type strain CC4533. We sequenced the whole genome of strain CC4533 using the PacBio Sequel II Continuous Long Read technology and have submitted it to NCBI along with the SRA and PacBio methylation motif data. Additionally, we have submitted the PacBio methylome to REBASE, Ref#35996. We present the whole genome sequence of S. yanoikuyae strain CC4533 that offers insights into its coding and non-coding genes and its nearest taxonomic neighbors.

Keywords¶

Sphingobium yanoikuyae, Alphaproteobacterium, carotenoids, xenobiotics-degrader, heavy metal-tolerant

Introduction¶

A novel strain of Sphingobium yanoikuyae strain CC4533, subsequently referred to as CC4533 in this narrative, was isolated from a contaminated Tris Acetate Phosphate (TAP) medium culture plate of the wild type strain CC4533 of Chlamydomonas reinhardtii at the University of West Georgia; Geolocation data: Carrollton, Georgia; 1,102 ft (336 m); 33.5730 N 85.1037 W (1). Strain CC4533 has one circular chromosome and four putative plasmids with a genome size of 5.8 Mb. We report here the complete genome sequence of CC4533 and offer insights into its genomic coding potential.

External Data Availability¶

The whole-genome sequence has been deposited in the GenBank under the accession number GCF_027627615.1.
The raw, Pac-Bio sequencing data has been submitted to the Sequence Read Archive (SRA) with the SRA accession number SRR23176675
The PacBio methylation motif has been submitted to NCBI: PacBio methylation motif data
The PacBio methylome has been submitted to REBASE, Ref#35996.
Linked publication: (1)

Background and Experimental Methods¶

Sample Collection¶

S. yanoikuyae sp. strain CC4533 was isolated from a contaminated Tris Acetate Phosphate (TAP) medium culture plate of the wild type strain CC4533 of the green micro-alga Chlamydomonas reinhardtii.

Isolation¶

Genomic DNA was isolated from the Lysogeny Broth (LB) medium-grown CC4533 strain (colony #28) using the Qiagen Blood and Cell Culture DNA Mini Kit.

Genome Sequencing¶

After determination of genomic DNA purity and DNA quantification, the DNA sample was shipped to Georgia Genomics and Bioinformatics Core (GGBC) at the University of Georgia (Athens, GA). At GGBC, the sample was processed for preparation of the PacBio Single Molecule Real Time (SMRT) bell sequencing library according to the protocol given in the PacBio technical manual for template preparation and sequencing (please see QC section for more details). The SMRT bell sequencing library was barcoded and sequenced with two additional barcoded microbial SMRT Bell sequencing libraries in a single SMRT cell using PacBio SMRT Continuous Long Read sequencing on the PacBio Sequel II instrument. SMRT Link v9 was used as an interface to manage the workflow from sample setup to result analysis.

QC and Assembly¶

Quantitative and qualitative QC assessment was performed on the DNA sample at GGBC using Qubit, Nanodrop, and Fragment Analyzer.
DNA was sheared using a Covaris® g-TUBE®. After shearing, the approximate size range of the fragments was determined with a Bioanalyzer® 12000 chip and the quantification of DNA was performed on a Nanodrop system.
Purification and concentration of 12kb fragment sizes was performed using 0.45X AMPure PB beads.
DNA damages in the sheared DNA were repaired with DNA Damage Repair reagents provided by Pacific Biosciences and the PacBio Template Prep Kit was used to repair the ends of fragmented DNA. Following end repairs, DNA was purified with 0.45X AMPure PB beads.
BLUNT hairpin adapters were ligated to the DNA fragments followed by exonuclease (ExoII and ExoVII) treatments to remove failed ligation products followed by size selection and purification using three distinct and consecutive 0.45X AMPure PB bead purification steps at room temperature to adequately remove enzymes (exonucleases, ligases, etc.) and ligation products smaller than 0.4kb (e.g., adapter dimers).
SMRTbell™ Library Quality assessment was performed using a Bioanalyzer® 12000 chip for sizing and was quantified via fluorescence using a Qubit® High Sensitivity kit.
Sequencing primer v4 was bound to the SMRTbell template. DNA sequencing polymerases were bound to the primer-annealed SMRTbell templates using the Sequel® II Binding Kit 2.0. AMPure® PB Purification of Polymerase Bound SMRTbell® Complexes was performed.
A dilution of the DNA Internal Control Complex (these controls are SMRTbell templates already bound with the polymerase, available from Pacific Biosciences) that had 30X DNA Internal Control Complex was added to the SMRT Bell template for independent determination of any problems that might have occurred during binding and the sequencing run.
Prior to sequencing, the SMRT Bell template-polymerase complex was loaded using MagBead loading to a 96-well sample plate with concentrations and volumes specified by the Pacific Biosciences Binding Calculator.

Import and Annotation¶

The complete GenBank full record (version 30-MAR-2025) of Sphingobium yanoikuyae strain CC4533 genome was imported into KBase from the Staging Area application, as follows: The chromosome NZ_CP115456.1 was imported under the Genome Type as a Finished Isolate with NCBI Tax ID 13690 Sphingobium yanoikuyae. The four plasmids were imported under the Genome Type Plasmid: NZ_CP115458.1, NZ_CP115457.1, NZ_CP115460.1, and NZ_CP115459.1.
The genome (chromosome and plasmids) was annotated in KBase using the Annotate Microbial Genome application which is based on RASTtk v1.073, with default parameters.
The chromosome was visualized using the KBase Circular Genome Visualization Tool with default parameters except for Linear, which remained unchecked to signal Circular.
The quality of the genome was assessed using the KBase Assess Genome Quality with CheckM-v1.0.18.

Genome Assembly¶

Reads were first assembled using the SMRT Link v9 software tools (has inbuilt HGAP v4.0). The pipeline was run at default with a pre-specified approximately estimated genome size of 5.5Mb (based on available complete genome sizes of various Sphingobium yanoikuyae on NCBI). After assembly, the assembly metrics for each sample along with HMM predicted genes were determined by running Quast v5.02. Genome was also assembled with FLYE v2.9.1 for statistical confidence.

Genome Statistics¶

The CC4533 genome consists of one circularized chromosome and four circularized plasmids comprising a total of 5,803,797 bp with an overall G+C content of approximately 64%. Genome coverage was calculated using the formula: Number of Subread Bases (mapped)/Genome Size = 24,679,499,303/5,803,797 = 4,252X. Genome coverage (based on hgap.depth_coverage_mean in the PacBio coverage report) is 4123X.

Name	RefSeq	Topology	Size (bp)	GC Content
Chromosome CP115456.1	NZ_CP115456.1	circular	4,747,868	64.812%
Plasmid pAS1	NZ_CP115458.1	circular	110,920	60.659%
Plasmid pKN1	NZ_CP115457.1	circular	647,509	59.691%
Plasmid pMM1	NZ_CP115460.1	circular	79,081	62.193%
Plasmid pTB1	NZ_CP115459.1	circular	218,419	63.372%
		Total	5,803,797

Taxonomic Classification¶

Taxonomic identification was performed using the KBase Classify Microbes with GTDB-Tk-v2.3.2 application on a GenomeSet generated with the Build GenomeSet-v1.7.6 application.
A phylogenetic tree was constructed using the KBase Insert Genome Into Species Tree-v2.2.0 application with parameters: Neighbor Public Genome Count = 200.
Another phylogenetic tree based on the 16S rRNA gene was constructed using the KBase Build Phylogenetic Tree from MSA using FastTree2-v2.1.11 application with parameters: top 50 sequences obtained from NCBI using BLASTN and aligned using CLUSTAL in MEGA 11.

References¶

Mitra M, Nguyen KMAK, Box TW et al. Isolation and characterization of a novel Sphingobium yanoikuyae strain variant that uses biohazardous saturated hydrocarbons and aromatic compounds as sole carbon sources. [version 1; peer review: 2 approved] F1000Research. 2020,9:767.

Created Object Name	Type	Description
CC4533_chromosome_NZ_CP115456.1.gb	Genome	Taxonomy and taxon_assignment updated with GTDB
CC4533_plasmid_pAS1_NZ_CP115458.1.gb	Genome	Taxonomy and taxon_assignment updated with GTDB
CC4533_plasmid_pKN1_NZ_CP115457.1.gb	Genome	Taxonomy and taxon_assignment updated with GTDB
CC4533_plasmid_pTB1_NZ_CP115459.1.gb	Genome	Taxonomy and taxon_assignment updated with GTDB
CC4533_plasmid_pMM1_NZ_CP115460.1.gb	Genome	Taxonomy and taxon_assignment updated with GTDB
CC4533_GenomeSet	GenomeSet	Taxonomy and taxon_assignment updated with GTDB
GTDB_SP_REPS.CC4533_chromosome_NZ_CP115456.1.gb.GenomeSet	GenomeSet	Proximal GTDB species reps for CC4533_chromosome_NZ_CP115456.1.gb
GTDB_SP_REPS.CC4533_plasmid_pAS1_NZ_CP115458.1.gb.GenomeSet	GenomeSet	Proximal GTDB species reps for CC4533_plasmid_pAS1_NZ_CP115458.1.gb
GTDB_SP_REPS.CC4533_plasmid_pKN1_NZ_CP115457.1.gb.GenomeSet	GenomeSet	Proximal GTDB species reps for CC4533_plasmid_pKN1_NZ_CP115457.1.gb
GTDB_SP_REPS.CC4533_plasmid_pMM1_NZ_CP115460.1.gb.GenomeSet	GenomeSet	Proximal GTDB species reps for CC4533_plasmid_pMM1_NZ_CP115460.1.gb
GTDB_SP_REPS.CC4533_plasmid_pTB1_NZ_CP115459.1.gb.GenomeSet	GenomeSet	Proximal GTDB species reps for CC4533_plasmid_pTB1_NZ_CP115459.1.gb
GTDB_SP_REPS-ALL.CC4533_GenomeSet.GenomeSet	GenomeSet	Proximal GTDB species reps for ALL query genomes in CC4533_GenomeSet
GTDB_Dendogram.gtdbtk.backbone.bac120.classify.tree-proximals.tree	Tree	with proximal GTDB species reps
GTDB_Dendogram.gtdbtk.backbone.bac120.classify.tree-trimmed.tree	Tree	trimmed with sister context
GTDB_Dendogram.gtdbtk.bac120.classify.tree.4.tree-proximals.tree	Tree	with proximal GTDB species reps
GTDB_Dendogram.gtdbtk.bac120.classify.tree.4.tree-trimmed.tree	Tree	trimmed with sister context