Generated October 6, 2025

Complete genome sequence of Sphingobium yanoikuyae strain CC4533

Mautusi Mitra1 ORCID and Ana Stanescu2 ORCID

1 University of West Georgia, School of Field Investigations and Experimental Sciences
2 University of West Georgia, School of Computing, Analytics, and Modeling, 1601 Maple Street, Carrollton, GA 30118, USA

Abstract

We have isolated a new strain of Sphingobium yanoikuyae, which belongs to the class Alphaproteobacteria, order Sphingomonadales, and family Sphingomonadaceae. This carotenoid-producing strain is capable of degrading xenobiotics and is tolerant to toxic levels of six heavy metals. We have designated the newly isolated strain of S. yanoikuyae as S. yanoikuyae strain CC4533 (hereafter called strain CC4533) because it was isolated from a contaminated Tris-Acetate-Phosphate (TAP) medium culture plate of a green micro-alga Chlamydomonas reinhardtii wild type strain CC4533. We sequenced the whole genome of strain CC4533 using the PacBio Sequel II Continuous Long Read technology and have submitted it to NCBI along with the SRA and PacBio methylation motif data. Additionally, we have submitted the PacBio methylome to REBASE, Ref#35996. We present the whole genome sequence of S. yanoikuyae strain CC4533 that offers insights into its coding and non-coding genes and its nearest taxonomic neighbors.

Keywords

Sphingobium yanoikuyae, Alphaproteobacterium, carotenoids, xenobiotics-degrader, heavy metal-tolerant

Introduction

A novel strain of Sphingobium yanoikuyae strain CC4533, subsequently referred to as CC4533 in this narrative, was isolated from a contaminated Tris Acetate Phosphate (TAP) medium culture plate of the wild type strain CC4533 of Chlamydomonas reinhardtii at the University of West Georgia; Geolocation data: Carrollton, Georgia; 1,102 ft (336 m); 33.5730 N 85.1037 W (1). Strain CC4533 has one circular chromosome and four putative plasmids with a genome size of 5.8 Mb. We report here the complete genome sequence of CC4533 and offer insights into its genomic coding potential.

External Data Availability

  • The whole-genome sequence has been deposited in the GenBank under the accession number GCF_027627615.1.
  • The raw, Pac-Bio sequencing data has been submitted to the Sequence Read Archive (SRA) with the SRA accession number SRR23176675
  • The PacBio methylation motif has been submitted to NCBI: PacBio methylation motif data
  • The PacBio methylome has been submitted to REBASE, Ref#35996.
  • Linked publication: (1)

Table of Contents

  1. Background and Experimental Methods
  2. QC and Assembly
  3. Import and Annotation
  4. Taxonomic Classification
  5. References

Background and Experimental Methods

Sample Collection

S. yanoikuyae sp. strain CC4533 was isolated from a contaminated Tris Acetate Phosphate (TAP) medium culture plate of the wild type strain CC4533 of the green micro-alga Chlamydomonas reinhardtii.

Isolation

Genomic DNA was isolated from the Lysogeny Broth (LB) medium-grown CC4533 strain (colony #28) using the Qiagen Blood and Cell Culture DNA Mini Kit.

Genome Sequencing

After determination of genomic DNA purity and DNA quantification, the DNA sample was shipped to Georgia Genomics and Bioinformatics Core (GGBC) at the University of Georgia (Athens, GA). At GGBC, the sample was processed for preparation of the PacBio Single Molecule Real Time (SMRT) bell sequencing library according to the protocol given in the PacBio technical manual for template preparation and sequencing (please see QC section for more details). The SMRT bell sequencing library was barcoded and sequenced with two additional barcoded microbial SMRT Bell sequencing libraries in a single SMRT cell using PacBio SMRT Continuous Long Read sequencing on the PacBio Sequel II instrument. SMRT Link v9 was used as an interface to manage the workflow from sample setup to result analysis.


QC and Assembly

  1. Quantitative and qualitative QC assessment was performed on the DNA sample at GGBC using Qubit, Nanodrop, and Fragment Analyzer.
  2. DNA was sheared using a Covaris® g-TUBE®. After shearing, the approximate size range of the fragments was determined with a Bioanalyzer® 12000 chip and the quantification of DNA was performed on a Nanodrop system.
  3. Purification and concentration of 12kb fragment sizes was performed using 0.45X AMPure PB beads.
  4. DNA damages in the sheared DNA were repaired with DNA Damage Repair reagents provided by Pacific Biosciences and the PacBio Template Prep Kit was used to repair the ends of fragmented DNA. Following end repairs, DNA was purified with 0.45X AMPure PB beads.
  5. BLUNT hairpin adapters were ligated to the DNA fragments followed by exonuclease (ExoII and ExoVII) treatments to remove failed ligation products followed by size selection and purification using three distinct and consecutive 0.45X AMPure PB bead purification steps at room temperature to adequately remove enzymes (exonucleases, ligases, etc.) and ligation products smaller than 0.4kb (e.g., adapter dimers).
  6. SMRTbell™ Library Quality assessment was performed using a Bioanalyzer® 12000 chip for sizing and was quantified via fluorescence using a Qubit® High Sensitivity kit.
  7. Sequencing primer v4 was bound to the SMRTbell template. DNA sequencing polymerases were bound to the primer-annealed SMRTbell templates using the Sequel® II Binding Kit 2.0. AMPure® PB Purification of Polymerase Bound SMRTbell® Complexes was performed.
  8. A dilution of the DNA Internal Control Complex (these controls are SMRTbell templates already bound with the polymerase, available from Pacific Biosciences) that had 30X DNA Internal Control Complex was added to the SMRT Bell template for independent determination of any problems that might have occurred during binding and the sequencing run.
  9. Prior to sequencing, the SMRT Bell template-polymerase complex was loaded using MagBead loading to a 96-well sample plate with concentrations and volumes specified by the Pacific Biosciences Binding Calculator.

Import and Annotation

  1. The complete GenBank full record (version 30-MAR-2025) of Sphingobium yanoikuyae strain CC4533 genome was imported into KBase from the Staging Area application, as follows: The chromosome NZ_CP115456.1 was imported under the Genome Type as a Finished Isolate with NCBI Tax ID 13690 Sphingobium yanoikuyae. The four plasmids were imported under the Genome Type Plasmid: NZ_CP115458.1, NZ_CP115457.1, NZ_CP115460.1, and NZ_CP115459.1.
  2. The genome (chromosome and plasmids) was annotated in KBase using the Annotate Microbial Genome application which is based on RASTtk v1.073, with default parameters.
  3. The chromosome was visualized using the KBase Circular Genome Visualization Tool with default parameters except for Linear, which remained unchecked to signal Circular.
  4. The quality of the genome was assessed using the KBase Assess Genome Quality with CheckM-v1.0.18.

Genome Assembly

Reads were first assembled using the SMRT Link v9 software tools (has inbuilt HGAP v4.0). The pipeline was run at default with a pre-specified approximately estimated genome size of 5.5Mb (based on available complete genome sizes of various Sphingobium yanoikuyae on NCBI). After assembly, the assembly metrics for each sample along with HMM predicted genes were determined by running Quast v5.02. Genome was also assembled with FLYE v2.9.1 for statistical confidence.


Genome Statistics

The CC4533 genome consists of one circularized chromosome and four circularized plasmids comprising a total of 5,803,797 bp with an overall G+C content of approximately 64%. Genome coverage was calculated using the formula: Number of Subread Bases (mapped)/Genome Size = 24,679,499,303/5,803,797 = 4,252X. Genome coverage (based on hgap.depth_coverage_mean in the PacBio coverage report) is 4123X.

Name RefSeq Topology Size (bp) GC Content
Chromosome CP115456.1 NZ_CP115456.1 circular 4,747,868 64.812%
Plasmid pAS1 NZ_CP115458.1 circular 110,920 60.659%
Plasmid pKN1 NZ_CP115457.1 circular 647,509 59.691%
Plasmid pMM1 NZ_CP115460.1 circular 79,081 62.193%
Plasmid pTB1 NZ_CP115459.1 circular 218,419 63.372%
Total 5,803,797

Taxonomic Classification

  1. Taxonomic identification was performed using the KBase Classify Microbes with GTDB-Tk-v2.3.2 application on a GenomeSet generated with the Build GenomeSet-v1.7.6 application.
  2. A phylogenetic tree was constructed using the KBase Insert Genome Into Species Tree-v2.2.0 application with parameters: Neighbor Public Genome Count = 200.
  3. Another phylogenetic tree based on the 16S rRNA gene was constructed using the KBase Build Phylogenetic Tree from MSA using FastTree2-v2.1.11 application with parameters: top 50 sequences obtained from NCBI using BLASTN and aligned using CLUSTAL in MEGA 11.

References

  1. Mitra M, Nguyen KMAK, Box TW et al. Isolation and characterization of a novel Sphingobium yanoikuyae strain variant that uses biohazardous saturated hydrocarbons and aromatic compounds as sole carbon sources. [version 1; peer review: 2 approved] F1000Research. 2020,9:767.

Output from Annotate Microbial Genome with RASTtk - v1.073
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/219679
Output from Annotate Microbial Genome with RASTtk - v1.073
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/219679
Output from Annotate Microbial Genome with RASTtk - v1.073
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/219679
Output from Annotate Microbial Genome with RASTtk - v1.073
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/219679
Output from Annotate Microbial Genome with RASTtk - v1.073
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/219679
Generate a map and annotations of circular genomes using CGView.
This app completed without errors in 3m 22s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/219679
  • KBase_derived_CC4533_chromosome_NZ_CP115456.1.gb.png
  • KBase_derived_CC4533_chromosome_NZ_CP115456.1.gb.jpg
  • KBase_derived_CC4533_chromosome_NZ_CP115456.1.gb.svg
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 7m 30s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/219679
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB)
This app completed without errors in 28m 5s.
Objects
Created Object Name Type Description
CC4533_chromosome_NZ_CP115456.1.gb Genome Taxonomy and taxon_assignment updated with GTDB
CC4533_plasmid_pAS1_NZ_CP115458.1.gb Genome Taxonomy and taxon_assignment updated with GTDB
CC4533_plasmid_pKN1_NZ_CP115457.1.gb Genome Taxonomy and taxon_assignment updated with GTDB
CC4533_plasmid_pTB1_NZ_CP115459.1.gb Genome Taxonomy and taxon_assignment updated with GTDB
CC4533_plasmid_pMM1_NZ_CP115460.1.gb Genome Taxonomy and taxon_assignment updated with GTDB
CC4533_GenomeSet GenomeSet Taxonomy and taxon_assignment updated with GTDB
GTDB_SP_REPS.CC4533_chromosome_NZ_CP115456.1.gb.GenomeSet GenomeSet Proximal GTDB species reps for CC4533_chromosome_NZ_CP115456.1.gb
GTDB_SP_REPS.CC4533_plasmid_pAS1_NZ_CP115458.1.gb.GenomeSet GenomeSet Proximal GTDB species reps for CC4533_plasmid_pAS1_NZ_CP115458.1.gb
GTDB_SP_REPS.CC4533_plasmid_pKN1_NZ_CP115457.1.gb.GenomeSet GenomeSet Proximal GTDB species reps for CC4533_plasmid_pKN1_NZ_CP115457.1.gb
GTDB_SP_REPS.CC4533_plasmid_pMM1_NZ_CP115460.1.gb.GenomeSet GenomeSet Proximal GTDB species reps for CC4533_plasmid_pMM1_NZ_CP115460.1.gb
GTDB_SP_REPS.CC4533_plasmid_pTB1_NZ_CP115459.1.gb.GenomeSet GenomeSet Proximal GTDB species reps for CC4533_plasmid_pTB1_NZ_CP115459.1.gb
GTDB_SP_REPS-ALL.CC4533_GenomeSet.GenomeSet GenomeSet Proximal GTDB species reps for ALL query genomes in CC4533_GenomeSet
GTDB_Dendogram.gtdbtk.backbone.bac120.classify.tree-proximals.tree Tree with proximal GTDB species reps
GTDB_Dendogram.gtdbtk.backbone.bac120.classify.tree-trimmed.tree Tree trimmed with sister context
GTDB_Dendogram.gtdbtk.bac120.classify.tree.4.tree-proximals.tree Tree with proximal GTDB species reps
GTDB_Dendogram.gtdbtk.bac120.classify.tree.4.tree-trimmed.tree Tree trimmed with sister context
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/219679
  • gtdbtk.backbone.bac120.classify.tree - gtdbtk.backbone.bac120.classify.tree - whole tree GTDB formatted Newick
  • gtdbtk.backbone.bac120.classify-ITOL.tree - gtdbtk.backbone.bac120.classify-ITOL.tree - whole tree ITOL formatted Newick
  • gtdbtk.bac120.classify.tree.4.tree - gtdbtk.bac120.classify.tree.4.tree - whole tree GTDB formatted Newick
  • gtdbtk.bac120.classify.tree.4-ITOL.tree - gtdbtk.bac120.classify.tree.4-ITOL.tree - whole tree ITOL formatted Newick
  • gtdbtk.backbone.bac120.classify-proximals.tree - gtdbtk.backbone.bac120.classify-proximals.tree - Newick
  • gtdbtk.backbone.bac120.classify-trimmed.tree - gtdbtk.backbone.bac120.classify-trimmed.tree - Newick
  • gtdbtk.backbone.bac120.classify-lineages.map - gtdbtk.backbone.bac120.classify-lineages.map - GTDB lineage
  • gtdbtk.backbone.bac120.classify-trimmed.tree-rectangle.PNG - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
  • gtdbtk.backbone.bac120.classify-trimmed.tree-rectangle.PDF - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
  • gtdbtk.backbone.bac120.classify-trimmed.tree-circle.PNG - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
  • gtdbtk.backbone.bac120.classify-trimmed.tree-circle.PDF - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
  • gtdbtk.backbone.bac120.classify-trimmed.tree-circle-ultrametric.PNG - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
  • gtdbtk.backbone.bac120.classify-trimmed.tree-circle-ultrametric.PDF - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
  • gtdbtk.bac120.classify.tree.4-proximals.tree - gtdbtk.bac120.classify.tree.4-proximals.tree - Newick
  • gtdbtk.bac120.classify.tree.4-trimmed.tree - gtdbtk.bac120.classify.tree.4-trimmed.tree - Newick
  • gtdbtk.bac120.classify.tree.4-lineages.map - gtdbtk.bac120.classify.tree.4-lineages.map - GTDB lineage
  • gtdbtk.bac120.classify.tree.4-trimmed.tree-rectangle.PNG - gtdbtk.bac120.classify.tree.4-trimmed.tree - Image
  • gtdbtk.bac120.classify.tree.4-trimmed.tree-rectangle.PDF - gtdbtk.bac120.classify.tree.4-trimmed.tree - Image
  • gtdbtk.bac120.classify.tree.4-trimmed.tree-circle.PNG - gtdbtk.bac120.classify.tree.4-trimmed.tree - Image
  • gtdbtk.bac120.classify.tree.4-trimmed.tree-circle.PDF - gtdbtk.bac120.classify.tree.4-trimmed.tree - Image
  • gtdbtk.bac120.classify.tree.4-trimmed.tree-circle-ultrametric.PNG - gtdbtk.bac120.classify.tree.4-trimmed.tree - Image
  • gtdbtk.bac120.classify.tree.4-trimmed.tree-circle-ultrametric.PDF - gtdbtk.bac120.classify.tree.4-trimmed.tree - Image
  • GTDB-Tk_classify_wf.zip - GTDB-Tk Classify WF output
Add one or more Genomes to a KBase SpeciesTree.
This app completed without errors in 6m 10s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/219679
  • CC4533_chromosome_NeighborPublicGenomeCount200.newick
  • CC4533_chromosome_NeighborPublicGenomeCount200-labels.newick
  • CC4533_chromosome_NeighborPublicGenomeCount200.png
  • CC4533_chromosome_NeighborPublicGenomeCount200.pdf
Build a phylogenetic reconstruction from a Multiple Sequence Alignment (MSA) using FastTree2.
This app completed without errors in 1m 44s.
Objects
Created Object Name Type Description
PhylogeneticTree_CLUSTAL_MAS_top50_16SrRNA Tree PhylogeneticTree_CLUSTAL_MAS_top50_16SrRNA Tree
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/219679
  • PhylogeneticTree_CLUSTAL_MAS_top50_16SrRNA.newick
  • PhylogeneticTree_CLUSTAL_MAS_top50_16SrRNA-labels.newick
  • PhylogeneticTree_CLUSTAL_MAS_top50_16SrRNA.png
  • PhylogeneticTree_CLUSTAL_MAS_top50_16SrRNA.pdf

Apps

  1. Assess Genome Quality with CheckM - v1.0.18
    • Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043 1055. doi:10.1101/gr.186072.114
    • CheckM source:
    • Additional info:
  2. Build Phylogenetic Tree from MSA using FastTree2 - v2.1.11
    • Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLOS ONE. 2010;5: e9490. doi:10.1371/journal.pone.0009490
    • Price MN, Dehal PS, Arkin AP. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol Biol Evol. 2009;26: 1641 1650. doi:10.1093/molbev/msp077
    • Huerta-Cepas J, Serra F, Bork P. ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data. Mol Biol Evol. 2016;33: 1635 1638. doi:10.1093/molbev/msw046
    • FastTree-2 source:
    • ETE3 source:
  3. Circular Genome Visualization Tool
    no citations
  4. Classify Microbes with GTDB-Tk - v2.3.2
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics, Volume 38, Issue 23, 1 December 2022, Pages 5315 5316. DOI: https://doi.org/10.1093/bioinformatics/btac672
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925 1927. DOI: https://doi.org/10.1093/bioinformatics/btz848
    • Donovan H Parks, Maria Chuvochina, Christian Rinke, Aaron J Mussig, Pierre-Alain Chaumeil, Philip Hugenholtz. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research, Volume 50, Issue D1, 7 January 2022, Pages D785 D794. DOI: https://doi.org/10.1093/nar/gkab776
    • Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996 1004 (2018). DOI: https://doi.org/10.1038/nbt.4229
    • Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol. 2020;10.1038/s41587-020-0501-8. DOI:10.1038/s41587-020-0501-8
    • Rinke C, Chuvochina M, Mussig AJ, Chaumeil PA, Dav n AA, Waite DW, Whitman WB, Parks DH, and Hugenholtz P. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol. 2021 Jul;6(7):946-959. DOI:10.1038/s41564-021-00918-8
    • Chivian D, Jungbluth SP, Dehal PS, Wood-Charlson EM, Canon RS, Allen BH, Clark MM, Gu T, Land ML, Price GA, Riehl WJ, Sneddon MW, Sutormin R, Zhang Q, Cottingham RW, Henry CS, Arkin AP. Metagenome-assembled genome extraction and analysis from microbiomes using KBase. Nat Protoc. 2023 Jan;18(1):208-238. doi: 10.1038/s41596-022-00747-x
    • Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics. 2010;11:538. Published 2010 Oct 30. doi:10.1186/1471-2105-11-538
    • Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9(1):5114. Published 2018 Nov 30. DOI:10.1038/s41467-018-07641-9
    • Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. DOI:10.1186/1471-2105-11-119
    • Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. Published 2010 Mar 10. DOI:10.1371/journal.pone.0009490 link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
    • Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7(10):e1002195. DOI:10.1371/journal.pcbi.1002195
    • Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016 Jun 20;17(1):132. DOI: 10.1186/s13059-016-0997-x
  5. Insert Genome Into SpeciesTree - v2.2.0
    • Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One. 2010;5. doi:10.1371/journal.pone.0009490