0xWheatyz ea415ba584 docs: add Claude Code project instructions

Add CLAUDE.md with comprehensive guidance for Claude Code when working
with this Talos Kubernetes cluster repository.

Includes:
- Development environment setup (Nix shell)
- Cluster bootstrap procedures
- Storage provisioner installation
- Common commands for Talos and Kubernetes
- GitLab and Gitea deployment instructions
- Troubleshooting guides

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2026-03-04 01:52:49 +00:00

9.7 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Development Environment

This repository uses Nix for managing development tools. Enter the development shell:

nix-shell

The shell automatically configures:

TALOSCONFIG → testing1/.talosconfig
KUBECONFIG → testing1/kubeconfig
NIX_PROJECT_SHELL → kubernetes-management

Available tools in the Nix shell:

talosctl - Talos Linux cluster management
kubectl - Kubernetes cluster management
flux - FluxCD GitOps toolkit

Cluster Bootstrap

To bootstrap a new Talos cluster from scratch, use the provided bootstrap script:

# Enter the Nix shell first
nix-shell

# Run the bootstrap script
./bootstrap-cluster.sh

The bootstrap script (bootstrap-cluster.sh) will:

Generate new Talos secrets and machine configurations
Apply configurations to all nodes (10.0.1.3, 10.0.1.4, 10.0.1.5)
Bootstrap etcd on the first control plane
Retrieve kubeconfig
Verify cluster health

All generated files are saved to testing1/ directory:

testing1/.talosconfig - Talos client configuration
testing1/kubeconfig - Kubernetes client configuration
testing1/secrets.yaml - Cluster secrets (keep secure!)
testing1/controlplane-*.yaml - Per-node configurations

Troubleshooting Bootstrap

If nodes remain in maintenance mode or bootstrap fails:

Check cluster status:
```
./check-cluster-status.sh
```

Manual bootstrap process: If the automated script fails, bootstrap manually:

# Step 1: Check if nodes are accessible
talosctl --nodes 10.0.1.3 version

# Step 2: Apply config to each node if in maintenance mode
talosctl apply-config --insecure --nodes 10.0.1.3 --file testing1/controlplane-10.0.1.3.yaml
talosctl apply-config --insecure --nodes 10.0.1.4 --file testing1/controlplane-10.0.1.4.yaml
talosctl apply-config --insecure --nodes 10.0.1.5 --file testing1/controlplane-10.0.1.5.yaml

# Step 3: Wait for nodes to reboot (2-5 minutes)
# Check with: talosctl --nodes 10.0.1.3 get services

# Step 4: Bootstrap etcd on first node
talosctl bootstrap --nodes 10.0.1.3

# Step 5: Wait for Kubernetes (1-2 minutes)
# Check with: talosctl --nodes 10.0.1.3 service etcd status

# Step 6: Get kubeconfig
talosctl kubeconfig --nodes 10.0.1.3 testing1/kubeconfig --force

# Step 7: Verify cluster
kubectl get nodes

Common issues:
- Nodes in maintenance mode: Config not applied or nodes didn't reboot
- Bootstrap fails: Node not ready, check with talosctl get services
- etcd won't start: May need to reset nodes and start over

Storage Setup

Talos Linux does not include a default storage provisioner. You must install one before deploying applications that require persistent storage.

Install Local Path Provisioner (Recommended)

# Enter nix-shell
nix-shell

# Install local-path-provisioner
./install-local-path-storage.sh

This installs Rancher's local-path-provisioner which:

Dynamically provisions PersistentVolumes on local node storage
Sets itself as the default storage class
Simple and works well for single-node or testing clusters

Important: Local-path storage is NOT replicated. If a node fails, data is lost.

Verify Storage

# Check storage class
kubectl get storageclass

# Check provisioner is running
kubectl get pods -n local-path-storage

Alternative Storage Options

For production HA setups, consider:

OpenEBS: Distributed block storage with replication
Rook-Ceph: Full-featured distributed storage system
Longhorn: Cloud-native distributed storage

Common Commands

Talos Cluster Management

# Check cluster health
talosctl health

# Get cluster nodes
talosctl get members

# Apply configuration changes to controlplane
talosctl apply-config --file testing1/controlplane.yaml --nodes <node-ip>

# Apply configuration changes to worker
talosctl apply-config --file testing1/worker.yaml --nodes <node-ip>

# Get Talos version
talosctl version

# Access Talos dashboard
talosctl dashboard

Kubernetes Management

# Get cluster info
kubectl cluster-info

# Get all resources in all namespaces
kubectl get all -A

# Get nodes
kubectl get nodes

# Apply manifests from first-cluster
kubectl apply -f testing1/first-cluster/cluster/base/
kubectl apply -f testing1/first-cluster/apps/demo/

# Deploy applications using kustomize
kubectl apply -k testing1/first-cluster/apps/gitlab/
kubectl apply -k testing1/first-cluster/apps/<app-name>/

GitLab Management

Prerequisites: Storage provisioner must be installed first (see Storage Setup section)

# Deploy GitLab with Container Registry and Runner
kubectl apply -k testing1/first-cluster/apps/gitlab/

# Check GitLab status
kubectl get pods -n gitlab -w

# Check PVC status (should be Bound)
kubectl get pvc -n gitlab

# Get initial root password
kubectl exec -n gitlab deployment/gitlab -- grep 'Password:' /etc/gitlab/initial_root_password

# Access GitLab services
# - GitLab UI: http://<node-ip>:30080
# - SSH: <node-ip>:30022
# - Container Registry: http://<node-ip>:30500

# Restart GitLab Runner after updating registration token
kubectl rollout restart deployment/gitlab-runner -n gitlab

# Check runner logs
kubectl logs -n gitlab deployment/gitlab-runner -f

GitLab Troubleshooting

If GitLab pods are stuck in Pending:

# Check storage issues
./diagnose-storage.sh

# If no storage provisioner, install it
./install-local-path-storage.sh

# Redeploy GitLab with storage
./redeploy-gitlab.sh

Architecture

Repository Structure

This is a Talos Kubernetes cluster management repository with the following structure:

testing1/ - Active testing cluster configuration
- controlplane.yaml - Talos config for control plane nodes (Kubernetes 1.33.0)
- worker.yaml - Talos config for worker nodes
- .talosconfig - Talos client configuration
- kubeconfig - Kubernetes client configuration
- first-cluster/ - Kubernetes manifests in GitOps structure
  - cluster/base/ - Cluster-level resources (namespaces, etc.)
  - apps/demo/ - Application deployments (nginx demo)
  - apps/gitlab/ - GitLab CE with Container Registry and CI/CD Runner
prod1/ - Production cluster placeholder (currently empty)
shell.nix - Nix development environment definition
bootstrap-cluster.sh - Automated cluster bootstrap script
check-cluster-status.sh - Cluster status diagnostic tool
install-local-path-storage.sh - Install storage provisioner
diagnose-storage.sh - Storage diagnostic tool
redeploy-gitlab.sh - GitLab cleanup and redeployment
APP_DEPLOYMENT.md - Comprehensive guide for deploying applications

Cluster Configuration

The Talos cluster uses:

Kubernetes version: 1.33.0 (kubelet image: ghcr.io/siderolabs/kubelet:v1.33.0)
Machine token: dhmkxg.kgt4nn0mw72kd3yb (shared between control plane and workers)
Security: Seccomp profiles enabled by default
Manifests directory: Disabled (kubelet doesn't read from /etc/kubernetes/manifests)

GitOps Structure

Kubernetes manifests in testing1/first-cluster/ follow a GitOps-friendly layout:

cluster/ - Cluster infrastructure and base resources
apps/ - Application workloads organized by app name

Each app in apps/ contains its own deployment and service definitions.

Configuration Files

When modifying Talos configurations:

Edit testing1/controlplane.yaml for control plane changes
Edit testing1/worker.yaml for worker node changes
Apply changes using talosctl apply-config with the appropriate node IPs
Always specify --nodes flag to target specific nodes

When adding Kubernetes workloads:

Place cluster-level resources in testing1/first-cluster/cluster/base/
Place application manifests in testing1/first-cluster/apps/<app-name>/
Create a kustomization.yaml file to organize resources
Apply using kubectl apply -k testing1/first-cluster/apps/<app-name>/
See APP_DEPLOYMENT.md for detailed guide on adding new applications

Deployed Applications

GitLab (testing1/first-cluster/apps/gitlab/)

GitLab CE deployment with integrated Container Registry and CI/CD runner.

Components:

GitLab CE 16.11.1: Main GitLab instance
Container Registry: Docker image registry (port 5005/30500)
GitLab Runner: CI/CD runner with Docker-in-Docker support

Access:

UI: http://<node-ip>:30080
SSH: <node-ip>:30022
Registry: http://<node-ip>:30500

Storage:

gitlab-data: 50Gi - Git repositories, artifacts, uploads
gitlab-config: 5Gi - Configuration files
gitlab-logs: 5Gi - Application logs

Initial Setup:

Deploy: kubectl apply -k testing1/first-cluster/apps/gitlab/
Wait for pods to be ready (5-10 minutes)
Get root password: kubectl exec -n gitlab deployment/gitlab -- grep 'Password:' /etc/gitlab/initial_root_password
Access UI and configure runner registration token
Update testing1/first-cluster/apps/gitlab/runner-secret.yaml with token
Restart runner: kubectl rollout restart deployment/gitlab-runner -n gitlab

CI/CD Configuration:

The runner is configured for building Docker images with:

Executor: Docker
Privileged mode enabled
Access to host Docker socket
Tags: docker, kubernetes, dind

Example .gitlab-ci.yml for building container images:

stages:
  - build

build-image:
  stage: build
  image: docker:24-dind
  tags:
    - docker
  script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG

9.7 KiB Raw Blame History