docs: add Claude Code project instructions

Add CLAUDE.md with comprehensive guidance for Claude Code when working
with this Talos Kubernetes cluster repository.

Includes:
- Development environment setup (Nix shell)
- Cluster bootstrap procedures
- Storage provisioner installation
- Common commands for Talos and Kubernetes
- GitLab and Gitea deployment instructions
- Troubleshooting guides

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
0xWheatyz 2026-03-04 01:52:49 +00:00
parent af0403d330
commit ea415ba584

325
CLAUDE.md Normal file
View File

@ -0,0 +1,325 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Development Environment
This repository uses Nix for managing development tools. Enter the development shell:
```bash
nix-shell
```
The shell automatically configures:
- `TALOSCONFIG``testing1/.talosconfig`
- `KUBECONFIG``testing1/kubeconfig`
- `NIX_PROJECT_SHELL``kubernetes-management`
Available tools in the Nix shell:
- `talosctl` - Talos Linux cluster management
- `kubectl` - Kubernetes cluster management
- `flux` - FluxCD GitOps toolkit
## Cluster Bootstrap
To bootstrap a new Talos cluster from scratch, use the provided bootstrap script:
```bash
# Enter the Nix shell first
nix-shell
# Run the bootstrap script
./bootstrap-cluster.sh
```
The bootstrap script (`bootstrap-cluster.sh`) will:
1. Generate new Talos secrets and machine configurations
2. Apply configurations to all nodes (10.0.1.3, 10.0.1.4, 10.0.1.5)
3. Bootstrap etcd on the first control plane
4. Retrieve kubeconfig
5. Verify cluster health
All generated files are saved to `testing1/` directory:
- `testing1/.talosconfig` - Talos client configuration
- `testing1/kubeconfig` - Kubernetes client configuration
- `testing1/secrets.yaml` - Cluster secrets (keep secure!)
- `testing1/controlplane-*.yaml` - Per-node configurations
### Troubleshooting Bootstrap
If nodes remain in maintenance mode or bootstrap fails:
1. **Check cluster status**:
```bash
./check-cluster-status.sh
```
2. **Manual bootstrap process**:
If the automated script fails, bootstrap manually:
```bash
# Step 1: Check if nodes are accessible
talosctl --nodes 10.0.1.3 version
# Step 2: Apply config to each node if in maintenance mode
talosctl apply-config --insecure --nodes 10.0.1.3 --file testing1/controlplane-10.0.1.3.yaml
talosctl apply-config --insecure --nodes 10.0.1.4 --file testing1/controlplane-10.0.1.4.yaml
talosctl apply-config --insecure --nodes 10.0.1.5 --file testing1/controlplane-10.0.1.5.yaml
# Step 3: Wait for nodes to reboot (2-5 minutes)
# Check with: talosctl --nodes 10.0.1.3 get services
# Step 4: Bootstrap etcd on first node
talosctl bootstrap --nodes 10.0.1.3
# Step 5: Wait for Kubernetes (1-2 minutes)
# Check with: talosctl --nodes 10.0.1.3 service etcd status
# Step 6: Get kubeconfig
talosctl kubeconfig --nodes 10.0.1.3 testing1/kubeconfig --force
# Step 7: Verify cluster
kubectl get nodes
```
3. **Common issues**:
- **Nodes in maintenance mode**: Config not applied or nodes didn't reboot
- **Bootstrap fails**: Node not ready, check with `talosctl get services`
- **etcd won't start**: May need to reset nodes and start over
## Storage Setup
Talos Linux does not include a default storage provisioner. You must install one before deploying applications that require persistent storage.
### Install Local Path Provisioner (Recommended)
```bash
# Enter nix-shell
nix-shell
# Install local-path-provisioner
./install-local-path-storage.sh
```
This installs Rancher's local-path-provisioner which:
- Dynamically provisions PersistentVolumes on local node storage
- Sets itself as the default storage class
- Simple and works well for single-node or testing clusters
**Important**: Local-path storage is NOT replicated. If a node fails, data is lost.
### Verify Storage
```bash
# Check storage class
kubectl get storageclass
# Check provisioner is running
kubectl get pods -n local-path-storage
```
### Alternative Storage Options
For production HA setups, consider:
- **OpenEBS**: Distributed block storage with replication
- **Rook-Ceph**: Full-featured distributed storage system
- **Longhorn**: Cloud-native distributed storage
## Common Commands
### Talos Cluster Management
```bash
# Check cluster health
talosctl health
# Get cluster nodes
talosctl get members
# Apply configuration changes to controlplane
talosctl apply-config --file testing1/controlplane.yaml --nodes <node-ip>
# Apply configuration changes to worker
talosctl apply-config --file testing1/worker.yaml --nodes <node-ip>
# Get Talos version
talosctl version
# Access Talos dashboard
talosctl dashboard
```
### Kubernetes Management
```bash
# Get cluster info
kubectl cluster-info
# Get all resources in all namespaces
kubectl get all -A
# Get nodes
kubectl get nodes
# Apply manifests from first-cluster
kubectl apply -f testing1/first-cluster/cluster/base/
kubectl apply -f testing1/first-cluster/apps/demo/
# Deploy applications using kustomize
kubectl apply -k testing1/first-cluster/apps/gitlab/
kubectl apply -k testing1/first-cluster/apps/<app-name>/
```
### GitLab Management
**Prerequisites**: Storage provisioner must be installed first (see Storage Setup section)
```bash
# Deploy GitLab with Container Registry and Runner
kubectl apply -k testing1/first-cluster/apps/gitlab/
# Check GitLab status
kubectl get pods -n gitlab -w
# Check PVC status (should be Bound)
kubectl get pvc -n gitlab
# Get initial root password
kubectl exec -n gitlab deployment/gitlab -- grep 'Password:' /etc/gitlab/initial_root_password
# Access GitLab services
# - GitLab UI: http://<node-ip>:30080
# - SSH: <node-ip>:30022
# - Container Registry: http://<node-ip>:30500
# Restart GitLab Runner after updating registration token
kubectl rollout restart deployment/gitlab-runner -n gitlab
# Check runner logs
kubectl logs -n gitlab deployment/gitlab-runner -f
```
### GitLab Troubleshooting
If GitLab pods are stuck in Pending:
```bash
# Check storage issues
./diagnose-storage.sh
# If no storage provisioner, install it
./install-local-path-storage.sh
# Redeploy GitLab with storage
./redeploy-gitlab.sh
```
## Architecture
### Repository Structure
This is a Talos Kubernetes cluster management repository with the following structure:
- **testing1/** - Active testing cluster configuration
- **controlplane.yaml** - Talos config for control plane nodes (Kubernetes 1.33.0)
- **worker.yaml** - Talos config for worker nodes
- **.talosconfig** - Talos client configuration
- **kubeconfig** - Kubernetes client configuration
- **first-cluster/** - Kubernetes manifests in GitOps structure
- **cluster/base/** - Cluster-level resources (namespaces, etc.)
- **apps/demo/** - Application deployments (nginx demo)
- **apps/gitlab/** - GitLab CE with Container Registry and CI/CD Runner
- **prod1/** - Production cluster placeholder (currently empty)
- **shell.nix** - Nix development environment definition
- **bootstrap-cluster.sh** - Automated cluster bootstrap script
- **check-cluster-status.sh** - Cluster status diagnostic tool
- **install-local-path-storage.sh** - Install storage provisioner
- **diagnose-storage.sh** - Storage diagnostic tool
- **redeploy-gitlab.sh** - GitLab cleanup and redeployment
- **APP_DEPLOYMENT.md** - Comprehensive guide for deploying applications
### Cluster Configuration
The Talos cluster uses:
- **Kubernetes version**: 1.33.0 (kubelet image: `ghcr.io/siderolabs/kubelet:v1.33.0`)
- **Machine token**: `dhmkxg.kgt4nn0mw72kd3yb` (shared between control plane and workers)
- **Security**: Seccomp profiles enabled by default
- **Manifests directory**: Disabled (kubelet doesn't read from `/etc/kubernetes/manifests`)
### GitOps Structure
Kubernetes manifests in `testing1/first-cluster/` follow a GitOps-friendly layout:
- **cluster/** - Cluster infrastructure and base resources
- **apps/** - Application workloads organized by app name
Each app in `apps/` contains its own deployment and service definitions.
## Configuration Files
When modifying Talos configurations:
1. Edit `testing1/controlplane.yaml` for control plane changes
2. Edit `testing1/worker.yaml` for worker node changes
3. Apply changes using `talosctl apply-config` with the appropriate node IPs
4. Always specify `--nodes` flag to target specific nodes
When adding Kubernetes workloads:
1. Place cluster-level resources in `testing1/first-cluster/cluster/base/`
2. Place application manifests in `testing1/first-cluster/apps/<app-name>/`
3. Create a `kustomization.yaml` file to organize resources
4. Apply using `kubectl apply -k testing1/first-cluster/apps/<app-name>/`
5. See `APP_DEPLOYMENT.md` for detailed guide on adding new applications
## Deployed Applications
### GitLab (testing1/first-cluster/apps/gitlab/)
GitLab CE deployment with integrated Container Registry and CI/CD runner.
**Components:**
- **GitLab CE 16.11.1**: Main GitLab instance
- **Container Registry**: Docker image registry (port 5005/30500)
- **GitLab Runner**: CI/CD runner with Docker-in-Docker support
**Access:**
- UI: `http://<node-ip>:30080`
- SSH: `<node-ip>:30022`
- Registry: `http://<node-ip>:30500`
**Storage:**
- `gitlab-data`: 50Gi - Git repositories, artifacts, uploads
- `gitlab-config`: 5Gi - Configuration files
- `gitlab-logs`: 5Gi - Application logs
**Initial Setup:**
1. Deploy: `kubectl apply -k testing1/first-cluster/apps/gitlab/`
2. Wait for pods to be ready (5-10 minutes)
3. Get root password: `kubectl exec -n gitlab deployment/gitlab -- grep 'Password:' /etc/gitlab/initial_root_password`
4. Access UI and configure runner registration token
5. Update `testing1/first-cluster/apps/gitlab/runner-secret.yaml` with token
6. Restart runner: `kubectl rollout restart deployment/gitlab-runner -n gitlab`
**CI/CD Configuration:**
The runner is configured for building Docker images with:
- Executor: Docker
- Privileged mode enabled
- Access to host Docker socket
- Tags: `docker`, `kubernetes`, `dind`
Example `.gitlab-ci.yml` for building container images:
```yaml
stages:
- build
build-image:
stage: build
image: docker:24-dind
tags:
- docker
script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
- docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG .
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG
```