# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Development Environment This repository uses Nix for managing development tools. Enter the development shell: ```bash nix-shell ``` The shell automatically configures: - `TALOSCONFIG` → `testing1/.talosconfig` - `KUBECONFIG` → `testing1/kubeconfig` - `NIX_PROJECT_SHELL` → `kubernetes-management` Available tools in the Nix shell: - `talosctl` - Talos Linux cluster management - `kubectl` - Kubernetes cluster management - `flux` - FluxCD GitOps toolkit ## Cluster Bootstrap To bootstrap a new Talos cluster from scratch, use the provided bootstrap script: ```bash # Enter the Nix shell first nix-shell # Run the bootstrap script ./bootstrap-cluster.sh ``` The bootstrap script (`bootstrap-cluster.sh`) will: 1. Generate new Talos secrets and machine configurations 2. Apply configurations to all nodes (10.0.1.3, 10.0.1.4, 10.0.1.5) 3. Bootstrap etcd on the first control plane 4. Retrieve kubeconfig 5. Verify cluster health All generated files are saved to `testing1/` directory: - `testing1/.talosconfig` - Talos client configuration - `testing1/kubeconfig` - Kubernetes client configuration - `testing1/secrets.yaml` - Cluster secrets (keep secure!) - `testing1/controlplane-*.yaml` - Per-node configurations ### Troubleshooting Bootstrap If nodes remain in maintenance mode or bootstrap fails: 1. **Check cluster status**: ```bash ./check-cluster-status.sh ``` 2. **Manual bootstrap process**: If the automated script fails, bootstrap manually: ```bash # Step 1: Check if nodes are accessible talosctl --nodes 10.0.1.3 version # Step 2: Apply config to each node if in maintenance mode talosctl apply-config --insecure --nodes 10.0.1.3 --file testing1/controlplane-10.0.1.3.yaml talosctl apply-config --insecure --nodes 10.0.1.4 --file testing1/controlplane-10.0.1.4.yaml talosctl apply-config --insecure --nodes 10.0.1.5 --file testing1/controlplane-10.0.1.5.yaml # Step 3: Wait for nodes to reboot (2-5 minutes) # Check with: talosctl --nodes 10.0.1.3 get services # Step 4: Bootstrap etcd on first node talosctl bootstrap --nodes 10.0.1.3 # Step 5: Wait for Kubernetes (1-2 minutes) # Check with: talosctl --nodes 10.0.1.3 service etcd status # Step 6: Get kubeconfig talosctl kubeconfig --nodes 10.0.1.3 testing1/kubeconfig --force # Step 7: Verify cluster kubectl get nodes ``` 3. **Common issues**: - **Nodes in maintenance mode**: Config not applied or nodes didn't reboot - **Bootstrap fails**: Node not ready, check with `talosctl get services` - **etcd won't start**: May need to reset nodes and start over ## Storage Setup Talos Linux does not include a default storage provisioner. You must install one before deploying applications that require persistent storage. ### Install Local Path Provisioner (Recommended) ```bash # Enter nix-shell nix-shell # Install local-path-provisioner ./install-local-path-storage.sh ``` This installs Rancher's local-path-provisioner which: - Dynamically provisions PersistentVolumes on local node storage - Sets itself as the default storage class - Simple and works well for single-node or testing clusters **Important**: Local-path storage is NOT replicated. If a node fails, data is lost. ### Verify Storage ```bash # Check storage class kubectl get storageclass # Check provisioner is running kubectl get pods -n local-path-storage ``` ### Alternative Storage Options For production HA setups, consider: - **OpenEBS**: Distributed block storage with replication - **Rook-Ceph**: Full-featured distributed storage system - **Longhorn**: Cloud-native distributed storage ## Common Commands ### Talos Cluster Management ```bash # Check cluster health talosctl health # Get cluster nodes talosctl get members # Apply configuration changes to controlplane talosctl apply-config --file testing1/controlplane.yaml --nodes # Apply configuration changes to worker talosctl apply-config --file testing1/worker.yaml --nodes # Get Talos version talosctl version # Access Talos dashboard talosctl dashboard ``` ### Kubernetes Management ```bash # Get cluster info kubectl cluster-info # Get all resources in all namespaces kubectl get all -A # Get nodes kubectl get nodes # Apply manifests from first-cluster kubectl apply -f testing1/first-cluster/cluster/base/ kubectl apply -f testing1/first-cluster/apps/demo/ # Deploy applications using kustomize kubectl apply -k testing1/first-cluster/apps/gitea/ kubectl apply -k testing1/first-cluster/apps// ``` ### FluxCD GitOps Management FluxCD automatically syncs the cluster with the Git repository. Changes pushed to the `main` branch are automatically applied. ```bash # Check Flux status flux get all # Check GitRepository sync status flux get sources git # Check Kustomization status flux get kustomizations # Force immediate reconciliation flux reconcile kustomization cluster-sync --with-source # View Flux logs flux logs # Suspend/resume automatic sync (for maintenance) flux suspend kustomization cluster-sync flux resume kustomization cluster-sync ``` ### Gitea Management **Prerequisites**: Storage provisioner must be installed first (see Storage Setup section) ```bash # Deploy Gitea with CI/CD Runner kubectl apply -k testing1/first-cluster/apps/gitea/ # Check Gitea status kubectl get pods -n gitea -w # Check PVC status (should be Bound) kubectl get pvc -n gitea # Access Gitea # - Gitea UI: http://10.0.1.10 or http://:30300 # - SSH: 10.0.1.10:22 or :30222 # Initial Gitea setup # 1. Access the UI and complete the installation wizard # 2. Create an admin account # 3. Create a repository # Configure Gitea Runner # 1. In Gitea UI, go to Site Administration > Actions > Runners # 2. Create a new runner registration token # 3. Update the runner secret: kubectl create secret generic runner-secret \ --from-literal=token='YOUR_RUNNER_TOKEN' \ -n gitea --dry-run=client -o yaml | kubectl apply -f - # 4. Restart the runner to register with new token kubectl rollout restart deployment/gitea-runner -n gitea # Check runner logs kubectl logs -n gitea deployment/gitea-runner -c runner -f # Check runner status in Gitea UI # Go to: Site Administration > Actions > Runners ``` ### Gitea Troubleshooting If Gitea pods are stuck in Pending: ```bash # Check storage issues ./diagnose-storage.sh # If no storage provisioner, install it ./install-local-path-storage.sh ``` ### Flux Troubleshooting If Flux is not syncing changes: ```bash # Check GitRepository status flux get sources git # Check for authentication issues kubectl describe gitrepository talos-gitops -n flux-system # Verify SSH key secret exists kubectl get secret gitea-ssh -n flux-system # Check Kustomization status flux get kustomizations # View recent reconciliation errors flux logs --level=error # Force reconciliation flux reconcile source git talos-gitops flux reconcile kustomization cluster-sync ``` ## Architecture ### Repository Structure This is a Talos Kubernetes cluster management repository with the following structure: - **testing1/** - Active testing cluster configuration - **controlplane-\*.yaml** - Per-node Talos configs for control plane nodes - **worker.yaml** - Talos config for worker nodes (if applicable) - **.talosconfig** - Talos client configuration - **kubeconfig** - Kubernetes client configuration - **secrets.yaml** - Cluster secrets (keep secure!) - **first-cluster/** - Kubernetes manifests in GitOps structure - **cluster/base/** - Cluster-level resources (namespaces, etc.) - **cluster/flux/** - FluxCD GitOps configuration - **cluster/metallb/** - MetalLB load balancer configuration - **cluster/nfs-provisioner/** - NFS storage provisioner - **apps/demo/** - Application deployments (nginx demo) - **apps/gitea/** - Gitea with CI/CD Runner - **prod1/** - Production cluster placeholder (currently empty) - **shell.nix** - Nix development environment definition - **bootstrap-cluster.sh** - Automated cluster bootstrap script - **check-cluster-status.sh** - Cluster status diagnostic tool - **install-local-path-storage.sh** - Install storage provisioner - **diagnose-storage.sh** - Storage diagnostic tool - **APP_DEPLOYMENT.md** - Comprehensive guide for deploying applications - **GITEA_CONTAINER_REGISTRY.md** - Guide for using Gitea container registry and CI/CD - **CLAUDE.md** - This file - development guidance ### Cluster Configuration The Talos cluster uses: - **Kubernetes version**: 1.33.0 (kubelet image: `ghcr.io/siderolabs/kubelet:v1.33.0`) - **Machine token**: `dhmkxg.kgt4nn0mw72kd3yb` (shared between control plane and workers) - **Security**: Seccomp profiles enabled by default - **Manifests directory**: Disabled (kubelet doesn't read from `/etc/kubernetes/manifests`) ### GitOps Structure This repository uses FluxCD for GitOps automation. Kubernetes manifests in `testing1/first-cluster/` are automatically synced to the cluster. **Directory Layout:** - **cluster/** - Cluster infrastructure and base resources - **flux/** - FluxCD configuration (GitRepository, Kustomization) - **base/** - Namespaces and cluster-wide resources - **metallb/** - Load balancer configuration - **nfs-provisioner/** - Storage provisioner - **apps/** - Application workloads organized by app name **How FluxCD Works:** 1. FluxCD monitors the Gitea repository at `ssh://git@10.0.1.10/0xWheatyz/Talos` 2. Every 1 minute, it checks for new commits on the `main` branch 3. When changes are detected, Flux applies them to the cluster within 5 minutes 4. Prune is enabled - resources deleted from Git are automatically removed from the cluster **Making Changes:** 1. Edit manifests in `testing1/first-cluster/` 2. Commit and push to the `main` branch in Gitea 3. Flux automatically applies the changes to the cluster 4. Monitor sync status with `flux get all` ## Configuration Files ### Modifying Talos Configurations When modifying Talos configurations (NOT managed by Flux): 1. Edit `testing1/controlplane-.yaml` for specific control plane node changes 2. Edit `testing1/worker.yaml` for worker node changes 3. Apply changes using `talosctl apply-config` with the appropriate node IPs 4. Always specify `--nodes` flag to target specific nodes ### Adding/Modifying Kubernetes Workloads (GitOps Workflow) **Important**: Kubernetes manifests are managed by FluxCD. Changes are automatically synced from Git. 1. Place cluster-level resources in `testing1/first-cluster/cluster//` 2. Place application manifests in `testing1/first-cluster/apps//` 3. Create a `kustomization.yaml` file to organize resources 4. **Commit and push to Git** - Flux will automatically apply changes 5. Monitor deployment: `flux get kustomizations` and `kubectl get all -n ` 6. See `APP_DEPLOYMENT.md` for detailed guide on adding new applications **Manual Apply (for testing)**: If you need to test changes before committing: ```bash kubectl apply -k testing1/first-cluster/apps// ``` **GitOps Best Practices**: - Always commit working configurations to Git - Use feature branches for major changes - Test changes before pushing to `main` - Monitor Flux logs when deploying: `flux logs --follow` ## Deployed Applications ### Gitea (testing1/first-cluster/apps/gitea/) Gitea deployment with integrated CI/CD runner using Gitea Actions (compatible with GitHub Actions). **Components:** - **Gitea**: Lightweight self-hosted Git service - **Gitea Act Runner**: CI/CD runner with Docker-in-Docker support for running Actions workflows **Access:** - UI: `http://10.0.1.10` (via MetalLB) or `http://:30300` - SSH: `10.0.1.10:22` (via MetalLB) or `:30222` **Storage:** - `gitea-data`: 50Gi - Git repositories, attachments, LFS objects, Actions artifacts **Initial Setup:** 1. Deploy: `kubectl apply -k testing1/first-cluster/apps/gitea/` (or push to Git - Flux will deploy) 2. Wait for pods to be ready (2-5 minutes) 3. Access the UI and complete the installation wizard 4. Create an admin account 5. Enable Actions: Site Administration > Configuration > Actions > Enable **Configuring Gitea Actions Runner:** 1. **Generate Runner Registration Token:** - Go to Gitea UI: Site Administration > Actions > Runners - Click "Create new Runner" - Copy the registration token 2. **Update Runner Secret:** ```bash # Update the token in the secret kubectl create secret generic runner-secret \ --from-literal=token='YOUR_REGISTRATION_TOKEN' \ -n gitea --dry-run=client -o yaml | kubectl apply -f - # Or edit the file and commit to Git (recommended for GitOps): # Edit testing1/first-cluster/apps/gitea/runner-secret.yaml # Replace REPLACE_WITH_GITEA_RUNNER_TOKEN with your token # git add, commit, and push - Flux will update the secret ``` 3. **Restart Runner:** ```bash kubectl rollout restart deployment/gitea-runner -n gitea ``` 4. **Verify Runner Registration:** - Check logs: `kubectl logs -n gitea deployment/gitea-runner -c runner -f` - Check Gitea UI: Site Administration > Actions > Runners - You should see "kubernetes-runner" with status "Idle" **CI/CD Configuration (Gitea Actions):** The runner is configured for building Docker images with: - Docker-in-Docker (DinD) support - Labels: `ubuntu-latest`, `ubuntu-22.04` - Base image: `node:20-bullseye` Example `.gitea/workflows/build.yaml` for building container images: ```yaml name: Build Docker Image on: push: branches: - main jobs: build: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v4 - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 - name: Build Docker image run: | docker build -t myapp:${{ gitea.sha }} . docker tag myapp:${{ gitea.sha }} myapp:latest - name: Push to registry run: | echo "${{ secrets.REGISTRY_PASSWORD }}" | docker login -u "${{ secrets.REGISTRY_USER }}" --password-stdin registry.example.com docker push myapp:${{ gitea.sha }} docker push myapp:latest ``` **Runner Labels Configured:** - `ubuntu-latest` → `docker://node:20-bullseye` - `ubuntu-22.04` → `docker://node:20-bullseye` You can customize these labels in `testing1/first-cluster/apps/gitea/runner-deployment.yaml` under `GITEA_RUNNER_LABELS`. **Container Registry:** Gitea includes a built-in container registry (via Packages feature) for storing Docker images. This enables complete CI/CD pipelines similar to GitLab. Using the container registry: ```bash # Log in to Gitea registry docker login 10.0.1.10 -u your-username # Push images docker tag myapp:latest 10.0.1.10/username/repo:latest docker push 10.0.1.10/username/repo:latest # Pull images docker pull 10.0.1.10/username/repo:latest ``` For complete container registry setup, CI/CD workflows, and Kubernetes integration, see **[GITEA_CONTAINER_REGISTRY.md](GITEA_CONTAINER_REGISTRY.md)**.