docs: add Claude Code project instructions
Add CLAUDE.md with comprehensive guidance for Claude Code when working with this Talos Kubernetes cluster repository. Includes: - Development environment setup (Nix shell) - Cluster bootstrap procedures - Storage provisioner installation - Common commands for Talos and Kubernetes - GitLab and Gitea deployment instructions - Troubleshooting guides 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
parent
af0403d330
commit
ea415ba584
325
CLAUDE.md
Normal file
325
CLAUDE.md
Normal file
@ -0,0 +1,325 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Development Environment
|
||||
|
||||
This repository uses Nix for managing development tools. Enter the development shell:
|
||||
|
||||
```bash
|
||||
nix-shell
|
||||
```
|
||||
|
||||
The shell automatically configures:
|
||||
- `TALOSCONFIG` → `testing1/.talosconfig`
|
||||
- `KUBECONFIG` → `testing1/kubeconfig`
|
||||
- `NIX_PROJECT_SHELL` → `kubernetes-management`
|
||||
|
||||
Available tools in the Nix shell:
|
||||
- `talosctl` - Talos Linux cluster management
|
||||
- `kubectl` - Kubernetes cluster management
|
||||
- `flux` - FluxCD GitOps toolkit
|
||||
|
||||
## Cluster Bootstrap
|
||||
|
||||
To bootstrap a new Talos cluster from scratch, use the provided bootstrap script:
|
||||
|
||||
```bash
|
||||
# Enter the Nix shell first
|
||||
nix-shell
|
||||
|
||||
# Run the bootstrap script
|
||||
./bootstrap-cluster.sh
|
||||
```
|
||||
|
||||
The bootstrap script (`bootstrap-cluster.sh`) will:
|
||||
1. Generate new Talos secrets and machine configurations
|
||||
2. Apply configurations to all nodes (10.0.1.3, 10.0.1.4, 10.0.1.5)
|
||||
3. Bootstrap etcd on the first control plane
|
||||
4. Retrieve kubeconfig
|
||||
5. Verify cluster health
|
||||
|
||||
All generated files are saved to `testing1/` directory:
|
||||
- `testing1/.talosconfig` - Talos client configuration
|
||||
- `testing1/kubeconfig` - Kubernetes client configuration
|
||||
- `testing1/secrets.yaml` - Cluster secrets (keep secure!)
|
||||
- `testing1/controlplane-*.yaml` - Per-node configurations
|
||||
|
||||
### Troubleshooting Bootstrap
|
||||
|
||||
If nodes remain in maintenance mode or bootstrap fails:
|
||||
|
||||
1. **Check cluster status**:
|
||||
```bash
|
||||
./check-cluster-status.sh
|
||||
```
|
||||
|
||||
2. **Manual bootstrap process**:
|
||||
If the automated script fails, bootstrap manually:
|
||||
|
||||
```bash
|
||||
# Step 1: Check if nodes are accessible
|
||||
talosctl --nodes 10.0.1.3 version
|
||||
|
||||
# Step 2: Apply config to each node if in maintenance mode
|
||||
talosctl apply-config --insecure --nodes 10.0.1.3 --file testing1/controlplane-10.0.1.3.yaml
|
||||
talosctl apply-config --insecure --nodes 10.0.1.4 --file testing1/controlplane-10.0.1.4.yaml
|
||||
talosctl apply-config --insecure --nodes 10.0.1.5 --file testing1/controlplane-10.0.1.5.yaml
|
||||
|
||||
# Step 3: Wait for nodes to reboot (2-5 minutes)
|
||||
# Check with: talosctl --nodes 10.0.1.3 get services
|
||||
|
||||
# Step 4: Bootstrap etcd on first node
|
||||
talosctl bootstrap --nodes 10.0.1.3
|
||||
|
||||
# Step 5: Wait for Kubernetes (1-2 minutes)
|
||||
# Check with: talosctl --nodes 10.0.1.3 service etcd status
|
||||
|
||||
# Step 6: Get kubeconfig
|
||||
talosctl kubeconfig --nodes 10.0.1.3 testing1/kubeconfig --force
|
||||
|
||||
# Step 7: Verify cluster
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
3. **Common issues**:
|
||||
- **Nodes in maintenance mode**: Config not applied or nodes didn't reboot
|
||||
- **Bootstrap fails**: Node not ready, check with `talosctl get services`
|
||||
- **etcd won't start**: May need to reset nodes and start over
|
||||
|
||||
## Storage Setup
|
||||
|
||||
Talos Linux does not include a default storage provisioner. You must install one before deploying applications that require persistent storage.
|
||||
|
||||
### Install Local Path Provisioner (Recommended)
|
||||
|
||||
```bash
|
||||
# Enter nix-shell
|
||||
nix-shell
|
||||
|
||||
# Install local-path-provisioner
|
||||
./install-local-path-storage.sh
|
||||
```
|
||||
|
||||
This installs Rancher's local-path-provisioner which:
|
||||
- Dynamically provisions PersistentVolumes on local node storage
|
||||
- Sets itself as the default storage class
|
||||
- Simple and works well for single-node or testing clusters
|
||||
|
||||
**Important**: Local-path storage is NOT replicated. If a node fails, data is lost.
|
||||
|
||||
### Verify Storage
|
||||
|
||||
```bash
|
||||
# Check storage class
|
||||
kubectl get storageclass
|
||||
|
||||
# Check provisioner is running
|
||||
kubectl get pods -n local-path-storage
|
||||
```
|
||||
|
||||
### Alternative Storage Options
|
||||
|
||||
For production HA setups, consider:
|
||||
- **OpenEBS**: Distributed block storage with replication
|
||||
- **Rook-Ceph**: Full-featured distributed storage system
|
||||
- **Longhorn**: Cloud-native distributed storage
|
||||
|
||||
## Common Commands
|
||||
|
||||
### Talos Cluster Management
|
||||
|
||||
```bash
|
||||
# Check cluster health
|
||||
talosctl health
|
||||
|
||||
# Get cluster nodes
|
||||
talosctl get members
|
||||
|
||||
# Apply configuration changes to controlplane
|
||||
talosctl apply-config --file testing1/controlplane.yaml --nodes <node-ip>
|
||||
|
||||
# Apply configuration changes to worker
|
||||
talosctl apply-config --file testing1/worker.yaml --nodes <node-ip>
|
||||
|
||||
# Get Talos version
|
||||
talosctl version
|
||||
|
||||
# Access Talos dashboard
|
||||
talosctl dashboard
|
||||
```
|
||||
|
||||
### Kubernetes Management
|
||||
|
||||
```bash
|
||||
# Get cluster info
|
||||
kubectl cluster-info
|
||||
|
||||
# Get all resources in all namespaces
|
||||
kubectl get all -A
|
||||
|
||||
# Get nodes
|
||||
kubectl get nodes
|
||||
|
||||
# Apply manifests from first-cluster
|
||||
kubectl apply -f testing1/first-cluster/cluster/base/
|
||||
kubectl apply -f testing1/first-cluster/apps/demo/
|
||||
|
||||
# Deploy applications using kustomize
|
||||
kubectl apply -k testing1/first-cluster/apps/gitlab/
|
||||
kubectl apply -k testing1/first-cluster/apps/<app-name>/
|
||||
```
|
||||
|
||||
### GitLab Management
|
||||
|
||||
**Prerequisites**: Storage provisioner must be installed first (see Storage Setup section)
|
||||
|
||||
```bash
|
||||
# Deploy GitLab with Container Registry and Runner
|
||||
kubectl apply -k testing1/first-cluster/apps/gitlab/
|
||||
|
||||
# Check GitLab status
|
||||
kubectl get pods -n gitlab -w
|
||||
|
||||
# Check PVC status (should be Bound)
|
||||
kubectl get pvc -n gitlab
|
||||
|
||||
# Get initial root password
|
||||
kubectl exec -n gitlab deployment/gitlab -- grep 'Password:' /etc/gitlab/initial_root_password
|
||||
|
||||
# Access GitLab services
|
||||
# - GitLab UI: http://<node-ip>:30080
|
||||
# - SSH: <node-ip>:30022
|
||||
# - Container Registry: http://<node-ip>:30500
|
||||
|
||||
# Restart GitLab Runner after updating registration token
|
||||
kubectl rollout restart deployment/gitlab-runner -n gitlab
|
||||
|
||||
# Check runner logs
|
||||
kubectl logs -n gitlab deployment/gitlab-runner -f
|
||||
```
|
||||
|
||||
### GitLab Troubleshooting
|
||||
|
||||
If GitLab pods are stuck in Pending:
|
||||
|
||||
```bash
|
||||
# Check storage issues
|
||||
./diagnose-storage.sh
|
||||
|
||||
# If no storage provisioner, install it
|
||||
./install-local-path-storage.sh
|
||||
|
||||
# Redeploy GitLab with storage
|
||||
./redeploy-gitlab.sh
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Repository Structure
|
||||
|
||||
This is a Talos Kubernetes cluster management repository with the following structure:
|
||||
|
||||
- **testing1/** - Active testing cluster configuration
|
||||
- **controlplane.yaml** - Talos config for control plane nodes (Kubernetes 1.33.0)
|
||||
- **worker.yaml** - Talos config for worker nodes
|
||||
- **.talosconfig** - Talos client configuration
|
||||
- **kubeconfig** - Kubernetes client configuration
|
||||
- **first-cluster/** - Kubernetes manifests in GitOps structure
|
||||
- **cluster/base/** - Cluster-level resources (namespaces, etc.)
|
||||
- **apps/demo/** - Application deployments (nginx demo)
|
||||
- **apps/gitlab/** - GitLab CE with Container Registry and CI/CD Runner
|
||||
|
||||
- **prod1/** - Production cluster placeholder (currently empty)
|
||||
|
||||
- **shell.nix** - Nix development environment definition
|
||||
- **bootstrap-cluster.sh** - Automated cluster bootstrap script
|
||||
- **check-cluster-status.sh** - Cluster status diagnostic tool
|
||||
- **install-local-path-storage.sh** - Install storage provisioner
|
||||
- **diagnose-storage.sh** - Storage diagnostic tool
|
||||
- **redeploy-gitlab.sh** - GitLab cleanup and redeployment
|
||||
- **APP_DEPLOYMENT.md** - Comprehensive guide for deploying applications
|
||||
|
||||
### Cluster Configuration
|
||||
|
||||
The Talos cluster uses:
|
||||
- **Kubernetes version**: 1.33.0 (kubelet image: `ghcr.io/siderolabs/kubelet:v1.33.0`)
|
||||
- **Machine token**: `dhmkxg.kgt4nn0mw72kd3yb` (shared between control plane and workers)
|
||||
- **Security**: Seccomp profiles enabled by default
|
||||
- **Manifests directory**: Disabled (kubelet doesn't read from `/etc/kubernetes/manifests`)
|
||||
|
||||
### GitOps Structure
|
||||
|
||||
Kubernetes manifests in `testing1/first-cluster/` follow a GitOps-friendly layout:
|
||||
- **cluster/** - Cluster infrastructure and base resources
|
||||
- **apps/** - Application workloads organized by app name
|
||||
|
||||
Each app in `apps/` contains its own deployment and service definitions.
|
||||
|
||||
## Configuration Files
|
||||
|
||||
When modifying Talos configurations:
|
||||
1. Edit `testing1/controlplane.yaml` for control plane changes
|
||||
2. Edit `testing1/worker.yaml` for worker node changes
|
||||
3. Apply changes using `talosctl apply-config` with the appropriate node IPs
|
||||
4. Always specify `--nodes` flag to target specific nodes
|
||||
|
||||
When adding Kubernetes workloads:
|
||||
1. Place cluster-level resources in `testing1/first-cluster/cluster/base/`
|
||||
2. Place application manifests in `testing1/first-cluster/apps/<app-name>/`
|
||||
3. Create a `kustomization.yaml` file to organize resources
|
||||
4. Apply using `kubectl apply -k testing1/first-cluster/apps/<app-name>/`
|
||||
5. See `APP_DEPLOYMENT.md` for detailed guide on adding new applications
|
||||
|
||||
## Deployed Applications
|
||||
|
||||
### GitLab (testing1/first-cluster/apps/gitlab/)
|
||||
|
||||
GitLab CE deployment with integrated Container Registry and CI/CD runner.
|
||||
|
||||
**Components:**
|
||||
- **GitLab CE 16.11.1**: Main GitLab instance
|
||||
- **Container Registry**: Docker image registry (port 5005/30500)
|
||||
- **GitLab Runner**: CI/CD runner with Docker-in-Docker support
|
||||
|
||||
**Access:**
|
||||
- UI: `http://<node-ip>:30080`
|
||||
- SSH: `<node-ip>:30022`
|
||||
- Registry: `http://<node-ip>:30500`
|
||||
|
||||
**Storage:**
|
||||
- `gitlab-data`: 50Gi - Git repositories, artifacts, uploads
|
||||
- `gitlab-config`: 5Gi - Configuration files
|
||||
- `gitlab-logs`: 5Gi - Application logs
|
||||
|
||||
**Initial Setup:**
|
||||
1. Deploy: `kubectl apply -k testing1/first-cluster/apps/gitlab/`
|
||||
2. Wait for pods to be ready (5-10 minutes)
|
||||
3. Get root password: `kubectl exec -n gitlab deployment/gitlab -- grep 'Password:' /etc/gitlab/initial_root_password`
|
||||
4. Access UI and configure runner registration token
|
||||
5. Update `testing1/first-cluster/apps/gitlab/runner-secret.yaml` with token
|
||||
6. Restart runner: `kubectl rollout restart deployment/gitlab-runner -n gitlab`
|
||||
|
||||
**CI/CD Configuration:**
|
||||
|
||||
The runner is configured for building Docker images with:
|
||||
- Executor: Docker
|
||||
- Privileged mode enabled
|
||||
- Access to host Docker socket
|
||||
- Tags: `docker`, `kubernetes`, `dind`
|
||||
|
||||
Example `.gitlab-ci.yml` for building container images:
|
||||
```yaml
|
||||
stages:
|
||||
- build
|
||||
|
||||
build-image:
|
||||
stage: build
|
||||
image: docker:24-dind
|
||||
tags:
|
||||
- docker
|
||||
script:
|
||||
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
|
||||
- docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG .
|
||||
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG
|
||||
```
|
||||
Loading…
Reference in New Issue
Block a user