Add CLAUDE.md with comprehensive guidance for Claude Code when working with this Talos Kubernetes cluster repository. Includes: - Development environment setup (Nix shell) - Cluster bootstrap procedures - Storage provisioner installation - Common commands for Talos and Kubernetes - GitLab and Gitea deployment instructions - Troubleshooting guides 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
9.7 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Development Environment
This repository uses Nix for managing development tools. Enter the development shell:
nix-shell
The shell automatically configures:
TALOSCONFIG→testing1/.talosconfigKUBECONFIG→testing1/kubeconfigNIX_PROJECT_SHELL→kubernetes-management
Available tools in the Nix shell:
talosctl- Talos Linux cluster managementkubectl- Kubernetes cluster managementflux- FluxCD GitOps toolkit
Cluster Bootstrap
To bootstrap a new Talos cluster from scratch, use the provided bootstrap script:
# Enter the Nix shell first
nix-shell
# Run the bootstrap script
./bootstrap-cluster.sh
The bootstrap script (bootstrap-cluster.sh) will:
- Generate new Talos secrets and machine configurations
- Apply configurations to all nodes (10.0.1.3, 10.0.1.4, 10.0.1.5)
- Bootstrap etcd on the first control plane
- Retrieve kubeconfig
- Verify cluster health
All generated files are saved to testing1/ directory:
testing1/.talosconfig- Talos client configurationtesting1/kubeconfig- Kubernetes client configurationtesting1/secrets.yaml- Cluster secrets (keep secure!)testing1/controlplane-*.yaml- Per-node configurations
Troubleshooting Bootstrap
If nodes remain in maintenance mode or bootstrap fails:
-
Check cluster status:
./check-cluster-status.sh -
Manual bootstrap process: If the automated script fails, bootstrap manually:
# Step 1: Check if nodes are accessible talosctl --nodes 10.0.1.3 version # Step 2: Apply config to each node if in maintenance mode talosctl apply-config --insecure --nodes 10.0.1.3 --file testing1/controlplane-10.0.1.3.yaml talosctl apply-config --insecure --nodes 10.0.1.4 --file testing1/controlplane-10.0.1.4.yaml talosctl apply-config --insecure --nodes 10.0.1.5 --file testing1/controlplane-10.0.1.5.yaml # Step 3: Wait for nodes to reboot (2-5 minutes) # Check with: talosctl --nodes 10.0.1.3 get services # Step 4: Bootstrap etcd on first node talosctl bootstrap --nodes 10.0.1.3 # Step 5: Wait for Kubernetes (1-2 minutes) # Check with: talosctl --nodes 10.0.1.3 service etcd status # Step 6: Get kubeconfig talosctl kubeconfig --nodes 10.0.1.3 testing1/kubeconfig --force # Step 7: Verify cluster kubectl get nodes -
Common issues:
- Nodes in maintenance mode: Config not applied or nodes didn't reboot
- Bootstrap fails: Node not ready, check with
talosctl get services - etcd won't start: May need to reset nodes and start over
Storage Setup
Talos Linux does not include a default storage provisioner. You must install one before deploying applications that require persistent storage.
Install Local Path Provisioner (Recommended)
# Enter nix-shell
nix-shell
# Install local-path-provisioner
./install-local-path-storage.sh
This installs Rancher's local-path-provisioner which:
- Dynamically provisions PersistentVolumes on local node storage
- Sets itself as the default storage class
- Simple and works well for single-node or testing clusters
Important: Local-path storage is NOT replicated. If a node fails, data is lost.
Verify Storage
# Check storage class
kubectl get storageclass
# Check provisioner is running
kubectl get pods -n local-path-storage
Alternative Storage Options
For production HA setups, consider:
- OpenEBS: Distributed block storage with replication
- Rook-Ceph: Full-featured distributed storage system
- Longhorn: Cloud-native distributed storage
Common Commands
Talos Cluster Management
# Check cluster health
talosctl health
# Get cluster nodes
talosctl get members
# Apply configuration changes to controlplane
talosctl apply-config --file testing1/controlplane.yaml --nodes <node-ip>
# Apply configuration changes to worker
talosctl apply-config --file testing1/worker.yaml --nodes <node-ip>
# Get Talos version
talosctl version
# Access Talos dashboard
talosctl dashboard
Kubernetes Management
# Get cluster info
kubectl cluster-info
# Get all resources in all namespaces
kubectl get all -A
# Get nodes
kubectl get nodes
# Apply manifests from first-cluster
kubectl apply -f testing1/first-cluster/cluster/base/
kubectl apply -f testing1/first-cluster/apps/demo/
# Deploy applications using kustomize
kubectl apply -k testing1/first-cluster/apps/gitlab/
kubectl apply -k testing1/first-cluster/apps/<app-name>/
GitLab Management
Prerequisites: Storage provisioner must be installed first (see Storage Setup section)
# Deploy GitLab with Container Registry and Runner
kubectl apply -k testing1/first-cluster/apps/gitlab/
# Check GitLab status
kubectl get pods -n gitlab -w
# Check PVC status (should be Bound)
kubectl get pvc -n gitlab
# Get initial root password
kubectl exec -n gitlab deployment/gitlab -- grep 'Password:' /etc/gitlab/initial_root_password
# Access GitLab services
# - GitLab UI: http://<node-ip>:30080
# - SSH: <node-ip>:30022
# - Container Registry: http://<node-ip>:30500
# Restart GitLab Runner after updating registration token
kubectl rollout restart deployment/gitlab-runner -n gitlab
# Check runner logs
kubectl logs -n gitlab deployment/gitlab-runner -f
GitLab Troubleshooting
If GitLab pods are stuck in Pending:
# Check storage issues
./diagnose-storage.sh
# If no storage provisioner, install it
./install-local-path-storage.sh
# Redeploy GitLab with storage
./redeploy-gitlab.sh
Architecture
Repository Structure
This is a Talos Kubernetes cluster management repository with the following structure:
-
testing1/ - Active testing cluster configuration
- controlplane.yaml - Talos config for control plane nodes (Kubernetes 1.33.0)
- worker.yaml - Talos config for worker nodes
- .talosconfig - Talos client configuration
- kubeconfig - Kubernetes client configuration
- first-cluster/ - Kubernetes manifests in GitOps structure
- cluster/base/ - Cluster-level resources (namespaces, etc.)
- apps/demo/ - Application deployments (nginx demo)
- apps/gitlab/ - GitLab CE with Container Registry and CI/CD Runner
-
prod1/ - Production cluster placeholder (currently empty)
-
shell.nix - Nix development environment definition
-
bootstrap-cluster.sh - Automated cluster bootstrap script
-
check-cluster-status.sh - Cluster status diagnostic tool
-
install-local-path-storage.sh - Install storage provisioner
-
diagnose-storage.sh - Storage diagnostic tool
-
redeploy-gitlab.sh - GitLab cleanup and redeployment
-
APP_DEPLOYMENT.md - Comprehensive guide for deploying applications
Cluster Configuration
The Talos cluster uses:
- Kubernetes version: 1.33.0 (kubelet image:
ghcr.io/siderolabs/kubelet:v1.33.0) - Machine token:
dhmkxg.kgt4nn0mw72kd3yb(shared between control plane and workers) - Security: Seccomp profiles enabled by default
- Manifests directory: Disabled (kubelet doesn't read from
/etc/kubernetes/manifests)
GitOps Structure
Kubernetes manifests in testing1/first-cluster/ follow a GitOps-friendly layout:
- cluster/ - Cluster infrastructure and base resources
- apps/ - Application workloads organized by app name
Each app in apps/ contains its own deployment and service definitions.
Configuration Files
When modifying Talos configurations:
- Edit
testing1/controlplane.yamlfor control plane changes - Edit
testing1/worker.yamlfor worker node changes - Apply changes using
talosctl apply-configwith the appropriate node IPs - Always specify
--nodesflag to target specific nodes
When adding Kubernetes workloads:
- Place cluster-level resources in
testing1/first-cluster/cluster/base/ - Place application manifests in
testing1/first-cluster/apps/<app-name>/ - Create a
kustomization.yamlfile to organize resources - Apply using
kubectl apply -k testing1/first-cluster/apps/<app-name>/ - See
APP_DEPLOYMENT.mdfor detailed guide on adding new applications
Deployed Applications
GitLab (testing1/first-cluster/apps/gitlab/)
GitLab CE deployment with integrated Container Registry and CI/CD runner.
Components:
- GitLab CE 16.11.1: Main GitLab instance
- Container Registry: Docker image registry (port 5005/30500)
- GitLab Runner: CI/CD runner with Docker-in-Docker support
Access:
- UI:
http://<node-ip>:30080 - SSH:
<node-ip>:30022 - Registry:
http://<node-ip>:30500
Storage:
gitlab-data: 50Gi - Git repositories, artifacts, uploadsgitlab-config: 5Gi - Configuration filesgitlab-logs: 5Gi - Application logs
Initial Setup:
- Deploy:
kubectl apply -k testing1/first-cluster/apps/gitlab/ - Wait for pods to be ready (5-10 minutes)
- Get root password:
kubectl exec -n gitlab deployment/gitlab -- grep 'Password:' /etc/gitlab/initial_root_password - Access UI and configure runner registration token
- Update
testing1/first-cluster/apps/gitlab/runner-secret.yamlwith token - Restart runner:
kubectl rollout restart deployment/gitlab-runner -n gitlab
CI/CD Configuration:
The runner is configured for building Docker images with:
- Executor: Docker
- Privileged mode enabled
- Access to host Docker socket
- Tags:
docker,kubernetes,dind
Example .gitlab-ci.yml for building container images:
stages:
- build
build-image:
stage: build
image: docker:24-dind
tags:
- docker
script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
- docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG .
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG