Talos/CLAUDE.md
0xWheatyz ea415ba584 docs: add Claude Code project instructions
Add CLAUDE.md with comprehensive guidance for Claude Code when working
with this Talos Kubernetes cluster repository.

Includes:
- Development environment setup (Nix shell)
- Cluster bootstrap procedures
- Storage provisioner installation
- Common commands for Talos and Kubernetes
- GitLab and Gitea deployment instructions
- Troubleshooting guides

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-04 01:52:49 +00:00

9.7 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Development Environment

This repository uses Nix for managing development tools. Enter the development shell:

nix-shell

The shell automatically configures:

  • TALOSCONFIGtesting1/.talosconfig
  • KUBECONFIGtesting1/kubeconfig
  • NIX_PROJECT_SHELLkubernetes-management

Available tools in the Nix shell:

  • talosctl - Talos Linux cluster management
  • kubectl - Kubernetes cluster management
  • flux - FluxCD GitOps toolkit

Cluster Bootstrap

To bootstrap a new Talos cluster from scratch, use the provided bootstrap script:

# Enter the Nix shell first
nix-shell

# Run the bootstrap script
./bootstrap-cluster.sh

The bootstrap script (bootstrap-cluster.sh) will:

  1. Generate new Talos secrets and machine configurations
  2. Apply configurations to all nodes (10.0.1.3, 10.0.1.4, 10.0.1.5)
  3. Bootstrap etcd on the first control plane
  4. Retrieve kubeconfig
  5. Verify cluster health

All generated files are saved to testing1/ directory:

  • testing1/.talosconfig - Talos client configuration
  • testing1/kubeconfig - Kubernetes client configuration
  • testing1/secrets.yaml - Cluster secrets (keep secure!)
  • testing1/controlplane-*.yaml - Per-node configurations

Troubleshooting Bootstrap

If nodes remain in maintenance mode or bootstrap fails:

  1. Check cluster status:

    ./check-cluster-status.sh
    
  2. Manual bootstrap process: If the automated script fails, bootstrap manually:

    # Step 1: Check if nodes are accessible
    talosctl --nodes 10.0.1.3 version
    
    # Step 2: Apply config to each node if in maintenance mode
    talosctl apply-config --insecure --nodes 10.0.1.3 --file testing1/controlplane-10.0.1.3.yaml
    talosctl apply-config --insecure --nodes 10.0.1.4 --file testing1/controlplane-10.0.1.4.yaml
    talosctl apply-config --insecure --nodes 10.0.1.5 --file testing1/controlplane-10.0.1.5.yaml
    
    # Step 3: Wait for nodes to reboot (2-5 minutes)
    # Check with: talosctl --nodes 10.0.1.3 get services
    
    # Step 4: Bootstrap etcd on first node
    talosctl bootstrap --nodes 10.0.1.3
    
    # Step 5: Wait for Kubernetes (1-2 minutes)
    # Check with: talosctl --nodes 10.0.1.3 service etcd status
    
    # Step 6: Get kubeconfig
    talosctl kubeconfig --nodes 10.0.1.3 testing1/kubeconfig --force
    
    # Step 7: Verify cluster
    kubectl get nodes
    
  3. Common issues:

    • Nodes in maintenance mode: Config not applied or nodes didn't reboot
    • Bootstrap fails: Node not ready, check with talosctl get services
    • etcd won't start: May need to reset nodes and start over

Storage Setup

Talos Linux does not include a default storage provisioner. You must install one before deploying applications that require persistent storage.

# Enter nix-shell
nix-shell

# Install local-path-provisioner
./install-local-path-storage.sh

This installs Rancher's local-path-provisioner which:

  • Dynamically provisions PersistentVolumes on local node storage
  • Sets itself as the default storage class
  • Simple and works well for single-node or testing clusters

Important: Local-path storage is NOT replicated. If a node fails, data is lost.

Verify Storage

# Check storage class
kubectl get storageclass

# Check provisioner is running
kubectl get pods -n local-path-storage

Alternative Storage Options

For production HA setups, consider:

  • OpenEBS: Distributed block storage with replication
  • Rook-Ceph: Full-featured distributed storage system
  • Longhorn: Cloud-native distributed storage

Common Commands

Talos Cluster Management

# Check cluster health
talosctl health

# Get cluster nodes
talosctl get members

# Apply configuration changes to controlplane
talosctl apply-config --file testing1/controlplane.yaml --nodes <node-ip>

# Apply configuration changes to worker
talosctl apply-config --file testing1/worker.yaml --nodes <node-ip>

# Get Talos version
talosctl version

# Access Talos dashboard
talosctl dashboard

Kubernetes Management

# Get cluster info
kubectl cluster-info

# Get all resources in all namespaces
kubectl get all -A

# Get nodes
kubectl get nodes

# Apply manifests from first-cluster
kubectl apply -f testing1/first-cluster/cluster/base/
kubectl apply -f testing1/first-cluster/apps/demo/

# Deploy applications using kustomize
kubectl apply -k testing1/first-cluster/apps/gitlab/
kubectl apply -k testing1/first-cluster/apps/<app-name>/

GitLab Management

Prerequisites: Storage provisioner must be installed first (see Storage Setup section)

# Deploy GitLab with Container Registry and Runner
kubectl apply -k testing1/first-cluster/apps/gitlab/

# Check GitLab status
kubectl get pods -n gitlab -w

# Check PVC status (should be Bound)
kubectl get pvc -n gitlab

# Get initial root password
kubectl exec -n gitlab deployment/gitlab -- grep 'Password:' /etc/gitlab/initial_root_password

# Access GitLab services
# - GitLab UI: http://<node-ip>:30080
# - SSH: <node-ip>:30022
# - Container Registry: http://<node-ip>:30500

# Restart GitLab Runner after updating registration token
kubectl rollout restart deployment/gitlab-runner -n gitlab

# Check runner logs
kubectl logs -n gitlab deployment/gitlab-runner -f

GitLab Troubleshooting

If GitLab pods are stuck in Pending:

# Check storage issues
./diagnose-storage.sh

# If no storage provisioner, install it
./install-local-path-storage.sh

# Redeploy GitLab with storage
./redeploy-gitlab.sh

Architecture

Repository Structure

This is a Talos Kubernetes cluster management repository with the following structure:

  • testing1/ - Active testing cluster configuration

    • controlplane.yaml - Talos config for control plane nodes (Kubernetes 1.33.0)
    • worker.yaml - Talos config for worker nodes
    • .talosconfig - Talos client configuration
    • kubeconfig - Kubernetes client configuration
    • first-cluster/ - Kubernetes manifests in GitOps structure
      • cluster/base/ - Cluster-level resources (namespaces, etc.)
      • apps/demo/ - Application deployments (nginx demo)
      • apps/gitlab/ - GitLab CE with Container Registry and CI/CD Runner
  • prod1/ - Production cluster placeholder (currently empty)

  • shell.nix - Nix development environment definition

  • bootstrap-cluster.sh - Automated cluster bootstrap script

  • check-cluster-status.sh - Cluster status diagnostic tool

  • install-local-path-storage.sh - Install storage provisioner

  • diagnose-storage.sh - Storage diagnostic tool

  • redeploy-gitlab.sh - GitLab cleanup and redeployment

  • APP_DEPLOYMENT.md - Comprehensive guide for deploying applications

Cluster Configuration

The Talos cluster uses:

  • Kubernetes version: 1.33.0 (kubelet image: ghcr.io/siderolabs/kubelet:v1.33.0)
  • Machine token: dhmkxg.kgt4nn0mw72kd3yb (shared between control plane and workers)
  • Security: Seccomp profiles enabled by default
  • Manifests directory: Disabled (kubelet doesn't read from /etc/kubernetes/manifests)

GitOps Structure

Kubernetes manifests in testing1/first-cluster/ follow a GitOps-friendly layout:

  • cluster/ - Cluster infrastructure and base resources
  • apps/ - Application workloads organized by app name

Each app in apps/ contains its own deployment and service definitions.

Configuration Files

When modifying Talos configurations:

  1. Edit testing1/controlplane.yaml for control plane changes
  2. Edit testing1/worker.yaml for worker node changes
  3. Apply changes using talosctl apply-config with the appropriate node IPs
  4. Always specify --nodes flag to target specific nodes

When adding Kubernetes workloads:

  1. Place cluster-level resources in testing1/first-cluster/cluster/base/
  2. Place application manifests in testing1/first-cluster/apps/<app-name>/
  3. Create a kustomization.yaml file to organize resources
  4. Apply using kubectl apply -k testing1/first-cluster/apps/<app-name>/
  5. See APP_DEPLOYMENT.md for detailed guide on adding new applications

Deployed Applications

GitLab (testing1/first-cluster/apps/gitlab/)

GitLab CE deployment with integrated Container Registry and CI/CD runner.

Components:

  • GitLab CE 16.11.1: Main GitLab instance
  • Container Registry: Docker image registry (port 5005/30500)
  • GitLab Runner: CI/CD runner with Docker-in-Docker support

Access:

  • UI: http://<node-ip>:30080
  • SSH: <node-ip>:30022
  • Registry: http://<node-ip>:30500

Storage:

  • gitlab-data: 50Gi - Git repositories, artifacts, uploads
  • gitlab-config: 5Gi - Configuration files
  • gitlab-logs: 5Gi - Application logs

Initial Setup:

  1. Deploy: kubectl apply -k testing1/first-cluster/apps/gitlab/
  2. Wait for pods to be ready (5-10 minutes)
  3. Get root password: kubectl exec -n gitlab deployment/gitlab -- grep 'Password:' /etc/gitlab/initial_root_password
  4. Access UI and configure runner registration token
  5. Update testing1/first-cluster/apps/gitlab/runner-secret.yaml with token
  6. Restart runner: kubectl rollout restart deployment/gitlab-runner -n gitlab

CI/CD Configuration:

The runner is configured for building Docker images with:

  • Executor: Docker
  • Privileged mode enabled
  • Access to host Docker socket
  • Tags: docker, kubernetes, dind

Example .gitlab-ci.yml for building container images:

stages:
  - build

build-image:
  stage: build
  image: docker:24-dind
  tags:
    - docker
  script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG