Thursday, June 26, 2025

Oracle Clusterware and RAC Startup Sequence

 

Oracle Clusterware and RAC Startup Sequence


Oracle Clusterware and RAC Startup Sequence

The Four-Phased Startup Process: A Methodical Orchestration

Phase 1: Initial System Boot and Root Agent Trigger -

This phase marks the very beginning of the Clusterware startup, typically initiated by the operating system's boot process. At this point, no specific Clusterware daemons are yet running; the system is merely preparing the ground.

  • Operating System Boot: The server powers on, and the operating system (OS) initializes its core components.
  • `init.crs` script (or equivalent systemd service): The OS executes the primary Clusterware startup script. On Linux systems, this is historically /etc/init.d/init.crs or, in modern Linux distributions, a systemd unit like oracle-ohasd.service. This script, running with root privileges, is responsible for launching the Oracle High Availability Services Daemon (OHASD).

Phase 2: Core Cluster Services Bootstrap -

This is where the true core of the Clusterware comes to life. The essential daemons responsible for high availability and cluster synchronization are initiated, forming the foundational layer without which no other cluster component can operate.

  • `OHASD` (Oracle High Availability Services Daemon): This is the first Oracle Clusterware daemon to start. It runs as the root user and acts as the parent process for numerous other critical Clusterware components.
  • Purpose: OHASD orchestrates the startup and shutdown of other vital daemons, constantly monitors their health, and automatically attempts restarts if a failure is detected, thereby ensuring high availability at the Clusterware level.
  • Associated Processes (launched by OHASD's root agent):
  • `CSSD` (Cluster Synchronization Services Daemon): Critically important for managing node membership, processing heartbeats (signals exchanged between nodes to confirm their availability), and handling node evictions (fencing) if communication is lost to prevent data corruption. CSSD primarily communicates over the private interconnect and utilizes quorum disks (voting disks).
  • `Diskmon` (Disk Monitoring Service): Monitors the health and accessibility of shared storage, particularly the voting disks.
  • `CRSD Root Agent`: A child process of OHASD specifically tasked with managing root-owned cluster resources. This agent will later initiate the main CRSD process.
  • `GPNPD Root Agent`: Another child process of OHASD that is responsible for starting the GPNPD daemon.

Phase 3: Grid Infrastructure Daemons and Resource Management Core

Once the foundational high-availability and synchronization services are online, the cluster proceeds to launch the daemons responsible for dynamic configuration and the central management of all cluster resources.

  • `GPNPD` (Grid Plug and Play Daemon): This daemon is launched by the GPNPD Root Agent, which is itself managed by OHASD.
  • Purpose: Plays a vital role in managing the cluster's dynamic configuration. It facilitates easier addition or removal of nodes and manages network configurations within the cluster, supporting a "plug and play" functionality for grid components.
  • `CRSD` (Cluster Ready Services Daemon):
  • This central daemon is launched by the CRSD Root Agent, under OHASD's management.
  • Purpose: CRSD is the primary resource manager for the entire cluster. It reads the Oracle Cluster Registry (OCR) to determine which resources (e.g., databases, listeners, VIPs, services) need to be managed, their dependencies, and their desired states. CRSD then uses various agents to ensure these resources are brought online or managed appropriately. It's essentially the cluster's orchestration engine.

Phase 4: All Cluster Services Online (RAC Database and Application Startup)

In this final phase, with the foundational daemons and resource management in place, the cluster proceeds to bring up all remaining services, including the Oracle RAC database instances, listeners, and any user-defined applications or services, making the environment fully operational.

Primary Agents and Processes Launched by CRSD:

  • `Oracle Agent` (oraagent): A child process of CRSD.
  • Purpose: Specifically responsible for starting and managing Oracle-specific resources defined in the OCR. This includes:
  • Oracle Net Listeners: Processes that await incoming database connection requests.
  • Database Instances (`smon`, pmon, dbw0, lgwr, etc.): The actual Oracle database background processes for each instance on the node.
  • ASM Instances (`asmb`, rbal, etc.): If Oracle Automatic Storage Management (ASM) is being used for storage, these instances are brought online.
  • Database Services:Any specific application services defined within the database that need to be managed by Clusterware.
  • ACFS (ASM Cluster File System) Resources: If ACFS is deployed and managed by Clusterware.
  • `System Agent` (orarootagent):

Another critical child process of CRSD.

  • Purpose:Responsible for starting and managing root-owned cluster resources. Its primary responsibilities include:
  • Virtual IP Addresses (VIPs): These are logical IP addresses that can transparently failover to another node if the current node fails, ensuring continuous client connectivity.
  • Any other OS-level or network resources that are managed by Clusterware and require root privileges.

The Startup Flow: A Hierarchical View

To summarize the intricate flow, here's a hierarchical representation of the processes and their dependencies:

Operating System Boot

  1. Triggers the init.crs script (or systemd service).

init.crs (or systemd service)

  1. Starts OHASD (running as root).

OHASD (Oracle High Availability Services Daemon)

Launches:

  1. Launches CSSD (Cluster Synchronization Services Daemon)
  2. Launches Diskmon (Disk Monitoring Service)
  3. Launches CRSD Root Agent
  4. Launches GPNPD Root Agent

CRSD Root Agent

  1. Starts CRSD (Cluster Ready Services Daemon - running as the Oracle Grid Infrastructure owner).

GPNPD Root Agent

  1. Starts GPNPD (Grid Plug and Play Daemon - running as the Oracle Grid Infrastructure owner).

CRSD (Cluster Ready Services Daemon)

  1. Launches Oracle Agent (oraagent)
  2. Launches System Agent (orarootagent)

Oracle Agent (oraagent)

  1. Starts Oracle Net Listeners
  2. Starts Oracle Database Instances (e.g., smon, pmon, dbw0)
  3. Starts ASM Instances (if applicable)
  4. Starts Database Services
  5. Starts ACFS Resources (if applicable)

System Agent (orarootagent)

  1. Starts Virtual IP Addresses (VIPs)
  2. Starts Other root-managed network resources

 



Oracle Clusterware and RAC Startup Sequence

  Oracle Clusterware and RAC Startup Sequence Oracle Clusterware and RAC Startup Sequence The Four-Phased Startup Process: A Methodical ...