Compare commits
5 Commits
af0403d330
...
3ae15ffdfe
| Author | SHA1 | Date | |
|---|---|---|---|
| 3ae15ffdfe | |||
| f6a3b57bcc | |||
| 6c292da5f1 | |||
| 2ed1e82953 | |||
| ea415ba584 |
445
APP_DEPLOYMENT.md
Normal file
445
APP_DEPLOYMENT.md
Normal file
@ -0,0 +1,445 @@
|
||||
# Application Deployment Guide
|
||||
|
||||
This guide explains how to deploy applications to your Talos Kubernetes cluster following the GitOps structure used in this repository.
|
||||
|
||||
## Directory Structure
|
||||
|
||||
Applications are organized in the `testing1/first-cluster/apps/` directory:
|
||||
|
||||
```
|
||||
testing1/first-cluster/
|
||||
├── cluster/
|
||||
│ └── base/ # Cluster-level resources (namespaces, RBAC, etc.)
|
||||
└── apps/
|
||||
├── demo/ # Example nginx app
|
||||
│ ├── nginx-deployment.yaml
|
||||
│ └── nginx-service.yaml
|
||||
└── gitlab/ # GitLab with Container Registry
|
||||
├── namespace.yaml
|
||||
├── pvc.yaml
|
||||
├── configmap.yaml
|
||||
├── deployment.yaml
|
||||
├── service.yaml
|
||||
├── runner-secret.yaml
|
||||
├── runner-configmap.yaml
|
||||
├── runner-deployment.yaml
|
||||
└── kustomization.yaml
|
||||
```
|
||||
|
||||
## Deploying Applications
|
||||
|
||||
### Method 1: Direct kubectl apply
|
||||
|
||||
Apply individual app manifests:
|
||||
|
||||
```bash
|
||||
# Deploy a specific app
|
||||
kubectl apply -f testing1/first-cluster/apps/gitlab/
|
||||
|
||||
# Or use kustomize
|
||||
kubectl apply -k testing1/first-cluster/apps/gitlab/
|
||||
```
|
||||
|
||||
### Method 2: Using kustomize (Recommended)
|
||||
|
||||
Each app directory can contain a `kustomization.yaml` file that lists all resources:
|
||||
|
||||
```yaml
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
|
||||
resources:
|
||||
- namespace.yaml
|
||||
- deployment.yaml
|
||||
- service.yaml
|
||||
```
|
||||
|
||||
Deploy with:
|
||||
```bash
|
||||
kubectl apply -k testing1/first-cluster/apps/<app-name>/
|
||||
```
|
||||
|
||||
## Adding a New Application
|
||||
|
||||
Follow these steps to add a new application to your cluster:
|
||||
|
||||
### 1. Create App Directory
|
||||
|
||||
```bash
|
||||
mkdir -p testing1/first-cluster/apps/<app-name>
|
||||
cd testing1/first-cluster/apps/<app-name>
|
||||
```
|
||||
|
||||
### 2. Create Namespace (Optional but Recommended)
|
||||
|
||||
Create `namespace.yaml`:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: <app-name>
|
||||
```
|
||||
|
||||
### 3. Create Application Resources
|
||||
|
||||
Create the necessary Kubernetes resources. Common resources include:
|
||||
|
||||
#### Deployment
|
||||
|
||||
Create `deployment.yaml`:
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: <app-name>
|
||||
namespace: <app-name>
|
||||
labels:
|
||||
app: <app-name>
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: <app-name>
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: <app-name>
|
||||
spec:
|
||||
containers:
|
||||
- name: <container-name>
|
||||
image: <image:tag>
|
||||
ports:
|
||||
- containerPort: <port>
|
||||
resources:
|
||||
requests:
|
||||
cpu: "100m"
|
||||
memory: "128Mi"
|
||||
limits:
|
||||
cpu: "500m"
|
||||
memory: "512Mi"
|
||||
```
|
||||
|
||||
#### Service
|
||||
|
||||
Create `service.yaml`:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: <app-name>
|
||||
namespace: <app-name>
|
||||
spec:
|
||||
type: NodePort # or ClusterIP, LoadBalancer
|
||||
selector:
|
||||
app: <app-name>
|
||||
ports:
|
||||
- port: 80
|
||||
targetPort: <container-port>
|
||||
nodePort: 30XXX # if using NodePort (30000-32767)
|
||||
```
|
||||
|
||||
#### PersistentVolumeClaim (if needed)
|
||||
|
||||
Create `pvc.yaml`:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: <app-name>-data
|
||||
namespace: <app-name>
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 10Gi
|
||||
```
|
||||
|
||||
#### ConfigMap (if needed)
|
||||
|
||||
Create `configmap.yaml`:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: <app-name>-config
|
||||
namespace: <app-name>
|
||||
data:
|
||||
config.yml: |
|
||||
# Your configuration here
|
||||
```
|
||||
|
||||
#### Secret (if needed)
|
||||
|
||||
Create `secret.yaml`:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: <app-name>-secret
|
||||
namespace: <app-name>
|
||||
type: Opaque
|
||||
stringData:
|
||||
password: "change-me"
|
||||
api-key: "your-api-key"
|
||||
```
|
||||
|
||||
### 4. Create Kustomization File
|
||||
|
||||
Create `kustomization.yaml` to organize all resources:
|
||||
|
||||
```yaml
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
|
||||
resources:
|
||||
- namespace.yaml
|
||||
- pvc.yaml
|
||||
- configmap.yaml
|
||||
- secret.yaml
|
||||
- deployment.yaml
|
||||
- service.yaml
|
||||
```
|
||||
|
||||
### 5. Deploy the Application
|
||||
|
||||
```bash
|
||||
# From the repository root
|
||||
kubectl apply -k testing1/first-cluster/apps/<app-name>/
|
||||
|
||||
# Verify deployment
|
||||
kubectl get all -n <app-name>
|
||||
```
|
||||
|
||||
## GitLab Deployment Example
|
||||
|
||||
### Prerequisites
|
||||
|
||||
1. Ensure your cluster is running and healthy:
|
||||
```bash
|
||||
kubectl get nodes
|
||||
talosctl health
|
||||
```
|
||||
|
||||
2. **IMPORTANT**: Install a storage provisioner first:
|
||||
```bash
|
||||
# Check if storage class exists
|
||||
kubectl get storageclass
|
||||
|
||||
# If no storage class found, install local-path-provisioner
|
||||
./install-local-path-storage.sh
|
||||
```
|
||||
|
||||
Without a storage provisioner, GitLab's PersistentVolumeClaims will remain in Pending state and pods won't start.
|
||||
|
||||
### Deploy GitLab
|
||||
|
||||
1. **Update the runner registration token** in `testing1/first-cluster/apps/gitlab/runner-secret.yaml`:
|
||||
|
||||
After GitLab is running, get the registration token from:
|
||||
- GitLab UI: `Admin Area > CI/CD > Runners > Register an instance runner`
|
||||
- Or for project runners: `Settings > CI/CD > Runners > New project runner`
|
||||
|
||||
2. **Deploy GitLab and Runner**:
|
||||
```bash
|
||||
kubectl apply -k testing1/first-cluster/apps/gitlab/
|
||||
```
|
||||
|
||||
3. **Wait for GitLab to be ready** (this can take 5-10 minutes):
|
||||
```bash
|
||||
kubectl get pods -n gitlab -w
|
||||
```
|
||||
|
||||
4. **Access GitLab**:
|
||||
- GitLab UI: `http://<any-node-ip>:30080`
|
||||
- SSH: `<any-node-ip>:30022`
|
||||
- Container Registry: `http://<any-node-ip>:30500`
|
||||
|
||||
5. **Get initial root password**:
|
||||
```bash
|
||||
kubectl exec -n gitlab deployment/gitlab -- grep 'Password:' /etc/gitlab/initial_root_password
|
||||
```
|
||||
|
||||
6. **Configure GitLab Runner**:
|
||||
- Login to GitLab
|
||||
- Get the runner registration token
|
||||
- Update `runner-secret.yaml` with the token
|
||||
- Re-apply the secret:
|
||||
```bash
|
||||
kubectl apply -f testing1/first-cluster/apps/gitlab/runner-secret.yaml
|
||||
```
|
||||
- Restart the runner:
|
||||
```bash
|
||||
kubectl rollout restart deployment/gitlab-runner -n gitlab
|
||||
```
|
||||
|
||||
### Using the Container Registry
|
||||
|
||||
1. **Login to the registry**:
|
||||
```bash
|
||||
docker login <node-ip>:30500
|
||||
```
|
||||
|
||||
2. **Tag and push images**:
|
||||
```bash
|
||||
docker tag myapp:latest <node-ip>:30500/mygroup/myapp:latest
|
||||
docker push <node-ip>:30500/mygroup/myapp:latest
|
||||
```
|
||||
|
||||
3. **Example `.gitlab-ci.yml` for building Docker images**:
|
||||
```yaml
|
||||
stages:
|
||||
- build
|
||||
- push
|
||||
|
||||
variables:
|
||||
DOCKER_DRIVER: overlay2
|
||||
DOCKER_TLS_CERTDIR: ""
|
||||
IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG
|
||||
|
||||
build:
|
||||
stage: build
|
||||
image: docker:24-dind
|
||||
services:
|
||||
- docker:24-dind
|
||||
tags:
|
||||
- docker
|
||||
script:
|
||||
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
|
||||
- docker build -t $IMAGE_TAG .
|
||||
- docker push $IMAGE_TAG
|
||||
```
|
||||
|
||||
## Resource Sizing Guidelines
|
||||
|
||||
When adding applications, consider these resource guidelines:
|
||||
|
||||
### Small Applications (web frontends, APIs)
|
||||
```yaml
|
||||
resources:
|
||||
requests:
|
||||
cpu: "100m"
|
||||
memory: "128Mi"
|
||||
limits:
|
||||
cpu: "500m"
|
||||
memory: "512Mi"
|
||||
```
|
||||
|
||||
### Medium Applications (databases, caching)
|
||||
```yaml
|
||||
resources:
|
||||
requests:
|
||||
cpu: "500m"
|
||||
memory: "1Gi"
|
||||
limits:
|
||||
cpu: "2000m"
|
||||
memory: "4Gi"
|
||||
```
|
||||
|
||||
### Large Applications (GitLab, monitoring stacks)
|
||||
```yaml
|
||||
resources:
|
||||
requests:
|
||||
cpu: "1000m"
|
||||
memory: "4Gi"
|
||||
limits:
|
||||
cpu: "4000m"
|
||||
memory: "8Gi"
|
||||
```
|
||||
|
||||
## Service Types
|
||||
|
||||
### ClusterIP (default)
|
||||
- Only accessible within the cluster
|
||||
- Use for internal services
|
||||
|
||||
### NodePort
|
||||
- Accessible on every node's IP at a static port (30000-32767)
|
||||
- Use for services you need to access from outside the cluster
|
||||
- Example: GitLab on port 30080
|
||||
|
||||
### LoadBalancer
|
||||
- Creates an external load balancer (if cloud provider supports it)
|
||||
- On bare metal, requires MetalLB or similar
|
||||
|
||||
## Storage Considerations
|
||||
|
||||
### Access Modes
|
||||
- `ReadWriteOnce` (RWO): Single node read/write (most common)
|
||||
- `ReadOnlyMany` (ROX): Multiple nodes read-only
|
||||
- `ReadWriteMany` (RWX): Multiple nodes read/write (requires special storage)
|
||||
|
||||
### Storage Sizing
|
||||
- Logs: 1-5 GB
|
||||
- Application data: 10-50 GB
|
||||
- Databases: 50-100+ GB
|
||||
- Container registries: 100+ GB
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Check Pod Status
|
||||
```bash
|
||||
kubectl get pods -n <namespace>
|
||||
kubectl describe pod <pod-name> -n <namespace>
|
||||
kubectl logs <pod-name> -n <namespace>
|
||||
```
|
||||
|
||||
### Check Events
|
||||
```bash
|
||||
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
|
||||
```
|
||||
|
||||
### Check Resource Usage
|
||||
```bash
|
||||
kubectl top nodes
|
||||
kubectl top pods -n <namespace>
|
||||
```
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **ImagePullBackOff**: Container image cannot be pulled
|
||||
- Check image name and tag
|
||||
- Verify registry credentials if using private registry
|
||||
|
||||
2. **CrashLoopBackOff**: Container keeps crashing
|
||||
- Check logs: `kubectl logs <pod> -n <namespace>`
|
||||
- Check resource limits
|
||||
- Verify configuration
|
||||
|
||||
3. **Pending Pods**: Pod cannot be scheduled
|
||||
- Check node resources: `kubectl describe node`
|
||||
- Check PVC status if using storage
|
||||
- Verify node selectors/taints
|
||||
|
||||
4. **PVC Stuck in Pending**: Storage cannot be provisioned
|
||||
- **Most common issue on Talos**: No storage provisioner installed
|
||||
- Check if storage class exists: `kubectl get sc`
|
||||
- If no storage class, install one:
|
||||
```bash
|
||||
./install-local-path-storage.sh
|
||||
```
|
||||
- Check PVC events: `kubectl describe pvc <pvc-name> -n <namespace>`
|
||||
- For GitLab specifically, use the redeploy script:
|
||||
```bash
|
||||
./redeploy-gitlab.sh
|
||||
```
|
||||
- Verify storage is available on nodes
|
||||
|
||||
5. **Storage Provisioner Issues**
|
||||
- Run diagnostics: `./diagnose-storage.sh`
|
||||
- Check provisioner pods: `kubectl get pods -n local-path-storage`
|
||||
- View provisioner logs: `kubectl logs -n local-path-storage deployment/local-path-provisioner`
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Set up FluxCD for GitOps automation
|
||||
- Configure ingress controller for HTTP/HTTPS routing
|
||||
- Set up monitoring with Prometheus and Grafana
|
||||
- Implement backup solutions for persistent data
|
||||
- Configure network policies for security
|
||||
325
CLAUDE.md
Normal file
325
CLAUDE.md
Normal file
@ -0,0 +1,325 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Development Environment
|
||||
|
||||
This repository uses Nix for managing development tools. Enter the development shell:
|
||||
|
||||
```bash
|
||||
nix-shell
|
||||
```
|
||||
|
||||
The shell automatically configures:
|
||||
- `TALOSCONFIG` → `testing1/.talosconfig`
|
||||
- `KUBECONFIG` → `testing1/kubeconfig`
|
||||
- `NIX_PROJECT_SHELL` → `kubernetes-management`
|
||||
|
||||
Available tools in the Nix shell:
|
||||
- `talosctl` - Talos Linux cluster management
|
||||
- `kubectl` - Kubernetes cluster management
|
||||
- `flux` - FluxCD GitOps toolkit
|
||||
|
||||
## Cluster Bootstrap
|
||||
|
||||
To bootstrap a new Talos cluster from scratch, use the provided bootstrap script:
|
||||
|
||||
```bash
|
||||
# Enter the Nix shell first
|
||||
nix-shell
|
||||
|
||||
# Run the bootstrap script
|
||||
./bootstrap-cluster.sh
|
||||
```
|
||||
|
||||
The bootstrap script (`bootstrap-cluster.sh`) will:
|
||||
1. Generate new Talos secrets and machine configurations
|
||||
2. Apply configurations to all nodes (10.0.1.3, 10.0.1.4, 10.0.1.5)
|
||||
3. Bootstrap etcd on the first control plane
|
||||
4. Retrieve kubeconfig
|
||||
5. Verify cluster health
|
||||
|
||||
All generated files are saved to `testing1/` directory:
|
||||
- `testing1/.talosconfig` - Talos client configuration
|
||||
- `testing1/kubeconfig` - Kubernetes client configuration
|
||||
- `testing1/secrets.yaml` - Cluster secrets (keep secure!)
|
||||
- `testing1/controlplane-*.yaml` - Per-node configurations
|
||||
|
||||
### Troubleshooting Bootstrap
|
||||
|
||||
If nodes remain in maintenance mode or bootstrap fails:
|
||||
|
||||
1. **Check cluster status**:
|
||||
```bash
|
||||
./check-cluster-status.sh
|
||||
```
|
||||
|
||||
2. **Manual bootstrap process**:
|
||||
If the automated script fails, bootstrap manually:
|
||||
|
||||
```bash
|
||||
# Step 1: Check if nodes are accessible
|
||||
talosctl --nodes 10.0.1.3 version
|
||||
|
||||
# Step 2: Apply config to each node if in maintenance mode
|
||||
talosctl apply-config --insecure --nodes 10.0.1.3 --file testing1/controlplane-10.0.1.3.yaml
|
||||
talosctl apply-config --insecure --nodes 10.0.1.4 --file testing1/controlplane-10.0.1.4.yaml
|
||||
talosctl apply-config --insecure --nodes 10.0.1.5 --file testing1/controlplane-10.0.1.5.yaml
|
||||
|
||||
# Step 3: Wait for nodes to reboot (2-5 minutes)
|
||||
# Check with: talosctl --nodes 10.0.1.3 get services
|
||||
|
||||
# Step 4: Bootstrap etcd on first node
|
||||
talosctl bootstrap --nodes 10.0.1.3
|
||||
|
||||
# Step 5: Wait for Kubernetes (1-2 minutes)
|
||||
# Check with: talosctl --nodes 10.0.1.3 service etcd status
|
||||
|
||||
# Step 6: Get kubeconfig
|
||||
talosctl kubeconfig --nodes 10.0.1.3 testing1/kubeconfig --force
|
||||
|
||||
# Step 7: Verify cluster
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
3. **Common issues**:
|
||||
- **Nodes in maintenance mode**: Config not applied or nodes didn't reboot
|
||||
- **Bootstrap fails**: Node not ready, check with `talosctl get services`
|
||||
- **etcd won't start**: May need to reset nodes and start over
|
||||
|
||||
## Storage Setup
|
||||
|
||||
Talos Linux does not include a default storage provisioner. You must install one before deploying applications that require persistent storage.
|
||||
|
||||
### Install Local Path Provisioner (Recommended)
|
||||
|
||||
```bash
|
||||
# Enter nix-shell
|
||||
nix-shell
|
||||
|
||||
# Install local-path-provisioner
|
||||
./install-local-path-storage.sh
|
||||
```
|
||||
|
||||
This installs Rancher's local-path-provisioner which:
|
||||
- Dynamically provisions PersistentVolumes on local node storage
|
||||
- Sets itself as the default storage class
|
||||
- Simple and works well for single-node or testing clusters
|
||||
|
||||
**Important**: Local-path storage is NOT replicated. If a node fails, data is lost.
|
||||
|
||||
### Verify Storage
|
||||
|
||||
```bash
|
||||
# Check storage class
|
||||
kubectl get storageclass
|
||||
|
||||
# Check provisioner is running
|
||||
kubectl get pods -n local-path-storage
|
||||
```
|
||||
|
||||
### Alternative Storage Options
|
||||
|
||||
For production HA setups, consider:
|
||||
- **OpenEBS**: Distributed block storage with replication
|
||||
- **Rook-Ceph**: Full-featured distributed storage system
|
||||
- **Longhorn**: Cloud-native distributed storage
|
||||
|
||||
## Common Commands
|
||||
|
||||
### Talos Cluster Management
|
||||
|
||||
```bash
|
||||
# Check cluster health
|
||||
talosctl health
|
||||
|
||||
# Get cluster nodes
|
||||
talosctl get members
|
||||
|
||||
# Apply configuration changes to controlplane
|
||||
talosctl apply-config --file testing1/controlplane.yaml --nodes <node-ip>
|
||||
|
||||
# Apply configuration changes to worker
|
||||
talosctl apply-config --file testing1/worker.yaml --nodes <node-ip>
|
||||
|
||||
# Get Talos version
|
||||
talosctl version
|
||||
|
||||
# Access Talos dashboard
|
||||
talosctl dashboard
|
||||
```
|
||||
|
||||
### Kubernetes Management
|
||||
|
||||
```bash
|
||||
# Get cluster info
|
||||
kubectl cluster-info
|
||||
|
||||
# Get all resources in all namespaces
|
||||
kubectl get all -A
|
||||
|
||||
# Get nodes
|
||||
kubectl get nodes
|
||||
|
||||
# Apply manifests from first-cluster
|
||||
kubectl apply -f testing1/first-cluster/cluster/base/
|
||||
kubectl apply -f testing1/first-cluster/apps/demo/
|
||||
|
||||
# Deploy applications using kustomize
|
||||
kubectl apply -k testing1/first-cluster/apps/gitlab/
|
||||
kubectl apply -k testing1/first-cluster/apps/<app-name>/
|
||||
```
|
||||
|
||||
### GitLab Management
|
||||
|
||||
**Prerequisites**: Storage provisioner must be installed first (see Storage Setup section)
|
||||
|
||||
```bash
|
||||
# Deploy GitLab with Container Registry and Runner
|
||||
kubectl apply -k testing1/first-cluster/apps/gitlab/
|
||||
|
||||
# Check GitLab status
|
||||
kubectl get pods -n gitlab -w
|
||||
|
||||
# Check PVC status (should be Bound)
|
||||
kubectl get pvc -n gitlab
|
||||
|
||||
# Get initial root password
|
||||
kubectl exec -n gitlab deployment/gitlab -- grep 'Password:' /etc/gitlab/initial_root_password
|
||||
|
||||
# Access GitLab services
|
||||
# - GitLab UI: http://<node-ip>:30080
|
||||
# - SSH: <node-ip>:30022
|
||||
# - Container Registry: http://<node-ip>:30500
|
||||
|
||||
# Restart GitLab Runner after updating registration token
|
||||
kubectl rollout restart deployment/gitlab-runner -n gitlab
|
||||
|
||||
# Check runner logs
|
||||
kubectl logs -n gitlab deployment/gitlab-runner -f
|
||||
```
|
||||
|
||||
### GitLab Troubleshooting
|
||||
|
||||
If GitLab pods are stuck in Pending:
|
||||
|
||||
```bash
|
||||
# Check storage issues
|
||||
./diagnose-storage.sh
|
||||
|
||||
# If no storage provisioner, install it
|
||||
./install-local-path-storage.sh
|
||||
|
||||
# Redeploy GitLab with storage
|
||||
./redeploy-gitlab.sh
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Repository Structure
|
||||
|
||||
This is a Talos Kubernetes cluster management repository with the following structure:
|
||||
|
||||
- **testing1/** - Active testing cluster configuration
|
||||
- **controlplane.yaml** - Talos config for control plane nodes (Kubernetes 1.33.0)
|
||||
- **worker.yaml** - Talos config for worker nodes
|
||||
- **.talosconfig** - Talos client configuration
|
||||
- **kubeconfig** - Kubernetes client configuration
|
||||
- **first-cluster/** - Kubernetes manifests in GitOps structure
|
||||
- **cluster/base/** - Cluster-level resources (namespaces, etc.)
|
||||
- **apps/demo/** - Application deployments (nginx demo)
|
||||
- **apps/gitlab/** - GitLab CE with Container Registry and CI/CD Runner
|
||||
|
||||
- **prod1/** - Production cluster placeholder (currently empty)
|
||||
|
||||
- **shell.nix** - Nix development environment definition
|
||||
- **bootstrap-cluster.sh** - Automated cluster bootstrap script
|
||||
- **check-cluster-status.sh** - Cluster status diagnostic tool
|
||||
- **install-local-path-storage.sh** - Install storage provisioner
|
||||
- **diagnose-storage.sh** - Storage diagnostic tool
|
||||
- **redeploy-gitlab.sh** - GitLab cleanup and redeployment
|
||||
- **APP_DEPLOYMENT.md** - Comprehensive guide for deploying applications
|
||||
|
||||
### Cluster Configuration
|
||||
|
||||
The Talos cluster uses:
|
||||
- **Kubernetes version**: 1.33.0 (kubelet image: `ghcr.io/siderolabs/kubelet:v1.33.0`)
|
||||
- **Machine token**: `dhmkxg.kgt4nn0mw72kd3yb` (shared between control plane and workers)
|
||||
- **Security**: Seccomp profiles enabled by default
|
||||
- **Manifests directory**: Disabled (kubelet doesn't read from `/etc/kubernetes/manifests`)
|
||||
|
||||
### GitOps Structure
|
||||
|
||||
Kubernetes manifests in `testing1/first-cluster/` follow a GitOps-friendly layout:
|
||||
- **cluster/** - Cluster infrastructure and base resources
|
||||
- **apps/** - Application workloads organized by app name
|
||||
|
||||
Each app in `apps/` contains its own deployment and service definitions.
|
||||
|
||||
## Configuration Files
|
||||
|
||||
When modifying Talos configurations:
|
||||
1. Edit `testing1/controlplane.yaml` for control plane changes
|
||||
2. Edit `testing1/worker.yaml` for worker node changes
|
||||
3. Apply changes using `talosctl apply-config` with the appropriate node IPs
|
||||
4. Always specify `--nodes` flag to target specific nodes
|
||||
|
||||
When adding Kubernetes workloads:
|
||||
1. Place cluster-level resources in `testing1/first-cluster/cluster/base/`
|
||||
2. Place application manifests in `testing1/first-cluster/apps/<app-name>/`
|
||||
3. Create a `kustomization.yaml` file to organize resources
|
||||
4. Apply using `kubectl apply -k testing1/first-cluster/apps/<app-name>/`
|
||||
5. See `APP_DEPLOYMENT.md` for detailed guide on adding new applications
|
||||
|
||||
## Deployed Applications
|
||||
|
||||
### GitLab (testing1/first-cluster/apps/gitlab/)
|
||||
|
||||
GitLab CE deployment with integrated Container Registry and CI/CD runner.
|
||||
|
||||
**Components:**
|
||||
- **GitLab CE 16.11.1**: Main GitLab instance
|
||||
- **Container Registry**: Docker image registry (port 5005/30500)
|
||||
- **GitLab Runner**: CI/CD runner with Docker-in-Docker support
|
||||
|
||||
**Access:**
|
||||
- UI: `http://<node-ip>:30080`
|
||||
- SSH: `<node-ip>:30022`
|
||||
- Registry: `http://<node-ip>:30500`
|
||||
|
||||
**Storage:**
|
||||
- `gitlab-data`: 50Gi - Git repositories, artifacts, uploads
|
||||
- `gitlab-config`: 5Gi - Configuration files
|
||||
- `gitlab-logs`: 5Gi - Application logs
|
||||
|
||||
**Initial Setup:**
|
||||
1. Deploy: `kubectl apply -k testing1/first-cluster/apps/gitlab/`
|
||||
2. Wait for pods to be ready (5-10 minutes)
|
||||
3. Get root password: `kubectl exec -n gitlab deployment/gitlab -- grep 'Password:' /etc/gitlab/initial_root_password`
|
||||
4. Access UI and configure runner registration token
|
||||
5. Update `testing1/first-cluster/apps/gitlab/runner-secret.yaml` with token
|
||||
6. Restart runner: `kubectl rollout restart deployment/gitlab-runner -n gitlab`
|
||||
|
||||
**CI/CD Configuration:**
|
||||
|
||||
The runner is configured for building Docker images with:
|
||||
- Executor: Docker
|
||||
- Privileged mode enabled
|
||||
- Access to host Docker socket
|
||||
- Tags: `docker`, `kubernetes`, `dind`
|
||||
|
||||
Example `.gitlab-ci.yml` for building container images:
|
||||
```yaml
|
||||
stages:
|
||||
- build
|
||||
|
||||
build-image:
|
||||
stage: build
|
||||
image: docker:24-dind
|
||||
tags:
|
||||
- docker
|
||||
script:
|
||||
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
|
||||
- docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG .
|
||||
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG
|
||||
```
|
||||
367
bootstrap-cluster.sh
Executable file
367
bootstrap-cluster.sh
Executable file
@ -0,0 +1,367 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# Configuration
|
||||
CLUSTER_NAME="talos-cluster"
|
||||
CONTROL_PLANE_NODES=("10.0.1.3" "10.0.1.4" "10.0.1.5")
|
||||
CLUSTER_ENDPOINT="https://10.0.1.3:6443"
|
||||
KUBERNETES_VERSION="1.33.0"
|
||||
OUTPUT_DIR="testing1"
|
||||
|
||||
# Colors for output
|
||||
GREEN='\033[0;32m'
|
||||
BLUE='\033[0;34m'
|
||||
YELLOW='\033[1;33m'
|
||||
RED='\033[0;31m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
log_info() {
|
||||
echo -e "${BLUE}[INFO]${NC} $1"
|
||||
}
|
||||
|
||||
log_success() {
|
||||
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||
}
|
||||
|
||||
log_warning() {
|
||||
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||
}
|
||||
|
||||
log_error() {
|
||||
echo -e "${RED}[ERROR]${NC} $1"
|
||||
}
|
||||
|
||||
# Check prerequisites
|
||||
check_prerequisites() {
|
||||
log_info "Checking prerequisites..."
|
||||
|
||||
if ! command -v talosctl &> /dev/null; then
|
||||
log_error "talosctl not found. Please run 'nix-shell' first."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if ! command -v kubectl &> /dev/null; then
|
||||
log_error "kubectl not found. Please run 'nix-shell' first."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
log_success "All prerequisites met"
|
||||
}
|
||||
|
||||
# Generate Talos secrets and configurations
|
||||
generate_configs() {
|
||||
log_info "Generating Talos secrets for cluster: ${CLUSTER_NAME}"
|
||||
|
||||
# Create output directory if it doesn't exist
|
||||
mkdir -p "${OUTPUT_DIR}"
|
||||
|
||||
# Generate secrets
|
||||
talosctl gen secrets --force -o "${OUTPUT_DIR}/secrets.yaml"
|
||||
log_success "Secrets generated"
|
||||
|
||||
# Generate configs for all 3 control plane nodes
|
||||
log_info "Generating machine configurations..."
|
||||
|
||||
for i in "${!CONTROL_PLANE_NODES[@]}"; do
|
||||
NODE_IP="${CONTROL_PLANE_NODES[$i]}"
|
||||
log_info "Generating config for control plane node: ${NODE_IP}"
|
||||
|
||||
talosctl gen config "${CLUSTER_NAME}" "${CLUSTER_ENDPOINT}" \
|
||||
--with-secrets "${OUTPUT_DIR}/secrets.yaml" \
|
||||
--kubernetes-version="${KUBERNETES_VERSION}" \
|
||||
--output-types controlplane \
|
||||
--output "${OUTPUT_DIR}/controlplane-${NODE_IP}.yaml" \
|
||||
--force \
|
||||
--config-patch @<(cat <<EOF
|
||||
machine:
|
||||
network:
|
||||
hostname: cp-${i}
|
||||
certSANs:
|
||||
- ${NODE_IP}
|
||||
- 10.0.1.3
|
||||
- 10.0.1.4
|
||||
- 10.0.1.5
|
||||
cluster:
|
||||
allowSchedulingOnControlPlanes: true
|
||||
controlPlane:
|
||||
endpoint: ${CLUSTER_ENDPOINT}
|
||||
EOF
|
||||
)
|
||||
done
|
||||
|
||||
# Generate talosconfig
|
||||
talosctl gen config "${CLUSTER_NAME}" "${CLUSTER_ENDPOINT}" \
|
||||
--with-secrets "${OUTPUT_DIR}/secrets.yaml" \
|
||||
--output-types talosconfig \
|
||||
--force \
|
||||
--output "${OUTPUT_DIR}/.talosconfig"
|
||||
|
||||
# Configure talosctl to use the new config
|
||||
export TALOSCONFIG="${OUTPUT_DIR}/.talosconfig"
|
||||
|
||||
# Add all endpoints to talosconfig
|
||||
talosctl config endpoint "${CONTROL_PLANE_NODES[@]}"
|
||||
talosctl config node "${CONTROL_PLANE_NODES[0]}"
|
||||
|
||||
log_success "All configurations generated in ${OUTPUT_DIR}/"
|
||||
}
|
||||
|
||||
# Apply configurations to nodes
|
||||
apply_configs() {
|
||||
log_info "Applying configurations to nodes..."
|
||||
|
||||
export TALOSCONFIG="${OUTPUT_DIR}/.talosconfig"
|
||||
|
||||
for NODE_IP in "${CONTROL_PLANE_NODES[@]}"; do
|
||||
log_info "Applying config to ${NODE_IP}..."
|
||||
|
||||
# Apply config with --insecure flag for initial bootstrap
|
||||
if talosctl apply-config \
|
||||
--insecure \
|
||||
--nodes "${NODE_IP}" \
|
||||
--file "${OUTPUT_DIR}/controlplane-${NODE_IP}.yaml"; then
|
||||
log_success "Configuration applied to ${NODE_IP}"
|
||||
else
|
||||
log_error "Failed to apply configuration to ${NODE_IP}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Brief pause between nodes
|
||||
sleep 2
|
||||
done
|
||||
|
||||
log_success "Configurations applied to all nodes"
|
||||
}
|
||||
|
||||
# Wait for nodes to be ready
|
||||
wait_for_nodes() {
|
||||
log_info "Waiting for nodes to reboot and be ready..."
|
||||
|
||||
export TALOSCONFIG="${OUTPUT_DIR}/.talosconfig"
|
||||
|
||||
# Wait for each node to be accessible
|
||||
for NODE_IP in "${CONTROL_PLANE_NODES[@]}"; do
|
||||
log_info "Waiting for node ${NODE_IP} to be accessible..."
|
||||
|
||||
local max_attempts=60
|
||||
local attempt=0
|
||||
|
||||
while [ $attempt -lt $max_attempts ]; do
|
||||
if talosctl --nodes "${NODE_IP}" version &> /dev/null 2>&1; then
|
||||
log_success "Node ${NODE_IP} is responding"
|
||||
break
|
||||
fi
|
||||
|
||||
attempt=$((attempt + 1))
|
||||
sleep 5
|
||||
done
|
||||
|
||||
if [ $attempt -eq $max_attempts ]; then
|
||||
log_error "Node ${NODE_IP} did not become accessible in time"
|
||||
exit 1
|
||||
fi
|
||||
done
|
||||
|
||||
# Wait for all nodes to be out of maintenance mode and services ready
|
||||
log_info "Checking that all nodes are out of maintenance mode..."
|
||||
|
||||
for NODE_IP in "${CONTROL_PLANE_NODES[@]}"; do
|
||||
local max_attempts=60
|
||||
local attempt=0
|
||||
|
||||
while [ $attempt -lt $max_attempts ]; do
|
||||
log_info "Checking services on ${NODE_IP} (attempt $((attempt + 1))/${max_attempts})..."
|
||||
|
||||
# Get service state - if this succeeds, node is configured
|
||||
if talosctl --nodes "${NODE_IP}" get services 2>&1 | grep -q "apid"; then
|
||||
log_success "Node ${NODE_IP} is out of maintenance mode"
|
||||
break
|
||||
fi
|
||||
|
||||
attempt=$((attempt + 1))
|
||||
sleep 5
|
||||
done
|
||||
|
||||
if [ $attempt -eq $max_attempts ]; then
|
||||
log_error "Node ${NODE_IP} did not exit maintenance mode"
|
||||
log_error "Try checking node console or running: talosctl --nodes ${NODE_IP} get services"
|
||||
exit 1
|
||||
fi
|
||||
done
|
||||
|
||||
# Additional wait to ensure etcd service is ready for bootstrap
|
||||
log_info "Waiting for etcd to be ready for bootstrap on ${CONTROL_PLANE_NODES[0]}..."
|
||||
sleep 10
|
||||
|
||||
log_success "All nodes are ready for bootstrapping"
|
||||
}
|
||||
|
||||
# Check if etcd is already bootstrapped
|
||||
check_etcd_status() {
|
||||
export TALOSCONFIG="${OUTPUT_DIR}/.talosconfig"
|
||||
|
||||
log_info "Checking if etcd is already bootstrapped..."
|
||||
|
||||
# Check if etcd service is running
|
||||
if talosctl --nodes "${CONTROL_PLANE_NODES[0]}" service etcd status 2>&1 | grep -q "STATE.*Running"; then
|
||||
log_warning "etcd is already running - cluster appears to be bootstrapped"
|
||||
return 1
|
||||
fi
|
||||
|
||||
return 0
|
||||
}
|
||||
|
||||
# Bootstrap etcd on the first control plane node
|
||||
bootstrap_cluster() {
|
||||
log_info "Bootstrapping etcd on first control plane node: ${CONTROL_PLANE_NODES[0]}"
|
||||
|
||||
export TALOSCONFIG="${OUTPUT_DIR}/.talosconfig"
|
||||
|
||||
# Check if already bootstrapped
|
||||
if ! check_etcd_status; then
|
||||
log_warning "Skipping bootstrap as cluster is already bootstrapped"
|
||||
return 0
|
||||
fi
|
||||
|
||||
# Verify the node is ready for bootstrap
|
||||
log_info "Verifying node ${CONTROL_PLANE_NODES[0]} is ready for bootstrap..."
|
||||
if ! talosctl --nodes "${CONTROL_PLANE_NODES[0]}" get members &> /dev/null; then
|
||||
log_warning "etcd members not yet initialized, proceeding with bootstrap..."
|
||||
fi
|
||||
|
||||
# Perform bootstrap
|
||||
log_info "Running bootstrap command..."
|
||||
if talosctl bootstrap --nodes "${CONTROL_PLANE_NODES[0]}"; then
|
||||
log_success "Bootstrap command executed successfully"
|
||||
else
|
||||
log_error "Failed to bootstrap etcd"
|
||||
log_error "This may be because:"
|
||||
log_error " 1. The node is still in maintenance mode (check with: talosctl --nodes ${CONTROL_PLANE_NODES[0]} get services)"
|
||||
log_error " 2. The configuration was not properly applied"
|
||||
log_error " 3. etcd is already bootstrapped"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Wait for etcd to come up
|
||||
log_info "Waiting for etcd to start..."
|
||||
local max_attempts=30
|
||||
local attempt=0
|
||||
|
||||
while [ $attempt -lt $max_attempts ]; do
|
||||
if talosctl --nodes "${CONTROL_PLANE_NODES[0]}" service etcd status 2>&1 | grep -q "STATE.*Running"; then
|
||||
log_success "etcd is running"
|
||||
break
|
||||
fi
|
||||
|
||||
attempt=$((attempt + 1))
|
||||
sleep 5
|
||||
done
|
||||
|
||||
if [ $attempt -eq $max_attempts ]; then
|
||||
log_warning "etcd did not start in expected time, but continuing..."
|
||||
fi
|
||||
|
||||
log_info "Waiting for Kubernetes to initialize..."
|
||||
sleep 30
|
||||
}
|
||||
|
||||
# Retrieve kubeconfig
|
||||
get_kubeconfig() {
|
||||
log_info "Retrieving kubeconfig..."
|
||||
|
||||
export TALOSCONFIG="${OUTPUT_DIR}/.talosconfig"
|
||||
|
||||
local max_attempts=20
|
||||
local attempt=0
|
||||
|
||||
while [ $attempt -lt $max_attempts ]; do
|
||||
log_info "Attempting to retrieve kubeconfig (attempt $((attempt + 1))/${max_attempts})..."
|
||||
|
||||
if talosctl kubeconfig --nodes "${CONTROL_PLANE_NODES[0]}" "${OUTPUT_DIR}/kubeconfig" --force; then
|
||||
log_success "Kubeconfig saved to ${OUTPUT_DIR}/kubeconfig"
|
||||
break
|
||||
fi
|
||||
|
||||
attempt=$((attempt + 1))
|
||||
sleep 10
|
||||
done
|
||||
|
||||
if [ $attempt -eq $max_attempts ]; then
|
||||
log_error "Failed to retrieve kubeconfig"
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
# Verify cluster health
|
||||
verify_cluster() {
|
||||
log_info "Verifying cluster health..."
|
||||
|
||||
export TALOSCONFIG="${OUTPUT_DIR}/.talosconfig"
|
||||
export KUBECONFIG="${OUTPUT_DIR}/kubeconfig"
|
||||
|
||||
log_info "Checking Talos health..."
|
||||
if talosctl health --wait-timeout 5m; then
|
||||
log_success "Talos cluster is healthy"
|
||||
else
|
||||
log_warning "Talos health check reported issues"
|
||||
fi
|
||||
|
||||
log_info "Checking Kubernetes nodes..."
|
||||
kubectl get nodes -o wide
|
||||
|
||||
log_info "Checking system pods..."
|
||||
kubectl get pods -A
|
||||
|
||||
log_success "Cluster verification complete"
|
||||
}
|
||||
|
||||
# Print summary
|
||||
print_summary() {
|
||||
echo ""
|
||||
echo "=========================================="
|
||||
log_success "Talos Cluster Bootstrap Complete!"
|
||||
echo "=========================================="
|
||||
echo ""
|
||||
echo "Cluster Name: ${CLUSTER_NAME}"
|
||||
echo "Control Plane Nodes:"
|
||||
for NODE_IP in "${CONTROL_PLANE_NODES[@]}"; do
|
||||
echo " - ${NODE_IP}"
|
||||
done
|
||||
echo ""
|
||||
echo "Configuration Files:"
|
||||
echo " - TALOSCONFIG: ${OUTPUT_DIR}/.talosconfig"
|
||||
echo " - KUBECONFIG: ${OUTPUT_DIR}/kubeconfig"
|
||||
echo ""
|
||||
echo "To use the cluster, export these variables:"
|
||||
echo " export TALOSCONFIG=\"\$(pwd)/${OUTPUT_DIR}/.talosconfig\""
|
||||
echo " export KUBECONFIG=\"\$(pwd)/${OUTPUT_DIR}/kubeconfig\""
|
||||
echo ""
|
||||
echo "Or run: nix-shell (which sets these automatically)"
|
||||
echo ""
|
||||
echo "Useful commands:"
|
||||
echo " talosctl health"
|
||||
echo " kubectl get nodes"
|
||||
echo " kubectl get pods -A"
|
||||
echo "=========================================="
|
||||
}
|
||||
|
||||
# Main execution
|
||||
main() {
|
||||
log_info "Starting Talos Cluster Bootstrap"
|
||||
log_info "Cluster: ${CLUSTER_NAME}"
|
||||
log_info "Nodes: ${CONTROL_PLANE_NODES[*]}"
|
||||
echo ""
|
||||
|
||||
check_prerequisites
|
||||
generate_configs
|
||||
apply_configs
|
||||
wait_for_nodes
|
||||
bootstrap_cluster
|
||||
get_kubeconfig
|
||||
verify_cluster
|
||||
print_summary
|
||||
}
|
||||
|
||||
# Run main function
|
||||
main
|
||||
148
check-cluster-status.sh
Executable file
148
check-cluster-status.sh
Executable file
@ -0,0 +1,148 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# Configuration
|
||||
CONTROL_PLANE_NODES=("10.0.1.3" "10.0.1.4" "10.0.1.5")
|
||||
TALOSCONFIG="${TALOSCONFIG:-testing1/.talosconfig}"
|
||||
|
||||
# Colors for output
|
||||
GREEN='\033[0;32m'
|
||||
BLUE='\033[0;34m'
|
||||
YELLOW='\033[1;33m'
|
||||
RED='\033[0;31m'
|
||||
NC='\033[0m'
|
||||
|
||||
log_info() {
|
||||
echo -e "${BLUE}[INFO]${NC} $1"
|
||||
}
|
||||
|
||||
log_success() {
|
||||
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||
}
|
||||
|
||||
log_warning() {
|
||||
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||
}
|
||||
|
||||
log_error() {
|
||||
echo -e "${RED}[ERROR]${NC} $1"
|
||||
}
|
||||
|
||||
# Check if talosconfig exists
|
||||
if [ ! -f "$TALOSCONFIG" ]; then
|
||||
log_error "TALOSCONFIG not found at: $TALOSCONFIG"
|
||||
log_info "Have you run ./bootstrap-cluster.sh yet?"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
export TALOSCONFIG
|
||||
|
||||
echo "=========================================="
|
||||
echo "Talos Cluster Status Check"
|
||||
echo "=========================================="
|
||||
echo ""
|
||||
|
||||
# Check each node
|
||||
for NODE_IP in "${CONTROL_PLANE_NODES[@]}"; do
|
||||
echo "==================== Node: $NODE_IP ===================="
|
||||
|
||||
# Check if node is accessible
|
||||
log_info "Checking if node is accessible..."
|
||||
if talosctl --nodes "$NODE_IP" version &> /dev/null; then
|
||||
log_success "Node is accessible"
|
||||
else
|
||||
log_error "Node is NOT accessible"
|
||||
echo ""
|
||||
continue
|
||||
fi
|
||||
|
||||
# Check version
|
||||
echo ""
|
||||
log_info "Talos version:"
|
||||
talosctl --nodes "$NODE_IP" version --short 2>&1 || log_error "Could not get version"
|
||||
|
||||
# Check if in maintenance mode
|
||||
echo ""
|
||||
log_info "Checking if node is in maintenance mode..."
|
||||
if talosctl --nodes "$NODE_IP" get services &> /dev/null; then
|
||||
log_success "Node is OUT of maintenance mode (configured)"
|
||||
else
|
||||
log_error "Node is IN MAINTENANCE MODE - configuration not applied!"
|
||||
log_info "To apply config, run:"
|
||||
log_info " talosctl apply-config --insecure --nodes $NODE_IP --file testing1/controlplane-${NODE_IP}.yaml"
|
||||
fi
|
||||
|
||||
# Check services
|
||||
echo ""
|
||||
log_info "Service status:"
|
||||
talosctl --nodes "$NODE_IP" services 2>&1 | head -20 || log_error "Could not get services"
|
||||
|
||||
# Check etcd status
|
||||
echo ""
|
||||
log_info "etcd status:"
|
||||
if talosctl --nodes "$NODE_IP" service etcd status 2>&1 | grep -q "STATE.*Running"; then
|
||||
log_success "etcd is RUNNING"
|
||||
talosctl --nodes "$NODE_IP" service etcd status 2>&1 | grep "STATE"
|
||||
else
|
||||
log_warning "etcd is NOT running"
|
||||
talosctl --nodes "$NODE_IP" service etcd status 2>&1 | grep "STATE" || log_info "etcd not initialized yet"
|
||||
fi
|
||||
|
||||
# Check if etcd members exist
|
||||
echo ""
|
||||
log_info "etcd members:"
|
||||
if talosctl --nodes "$NODE_IP" get members 2>&1 | grep -v "^NODE" | grep -v "not found"; then
|
||||
log_success "etcd members found"
|
||||
else
|
||||
log_warning "No etcd members - cluster needs bootstrap"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
done
|
||||
|
||||
# Overall cluster status
|
||||
echo "==================== Overall Cluster Status ===================="
|
||||
|
||||
# Check if any node has etcd running
|
||||
ETCD_RUNNING=false
|
||||
for NODE_IP in "${CONTROL_PLANE_NODES[@]}"; do
|
||||
if talosctl --nodes "$NODE_IP" service etcd status 2>&1 | grep -q "STATE.*Running"; then
|
||||
ETCD_RUNNING=true
|
||||
break
|
||||
fi
|
||||
done
|
||||
|
||||
echo ""
|
||||
if $ETCD_RUNNING; then
|
||||
log_success "Cluster appears to be bootstrapped (etcd running)"
|
||||
|
||||
# Try to get kubeconfig
|
||||
echo ""
|
||||
log_info "Attempting to retrieve kubeconfig..."
|
||||
if talosctl kubeconfig --nodes "${CONTROL_PLANE_NODES[0]}" ./kubeconfig-test --force 2>&1; then
|
||||
log_success "Kubeconfig retrieved successfully"
|
||||
|
||||
log_info "Kubernetes node status:"
|
||||
KUBECONFIG=./kubeconfig-test kubectl get nodes 2>&1 || log_error "Could not connect to Kubernetes"
|
||||
|
||||
rm -f ./kubeconfig-test
|
||||
else
|
||||
log_warning "Could not retrieve kubeconfig"
|
||||
fi
|
||||
else
|
||||
log_warning "Cluster is NOT bootstrapped yet"
|
||||
log_info ""
|
||||
log_info "Next steps:"
|
||||
log_info "1. Ensure all nodes are out of maintenance mode (see checks above)"
|
||||
log_info "2. If nodes are in maintenance mode, apply configs:"
|
||||
for NODE_IP in "${CONTROL_PLANE_NODES[@]}"; do
|
||||
log_info " talosctl apply-config --insecure --nodes $NODE_IP --file testing1/controlplane-${NODE_IP}.yaml"
|
||||
done
|
||||
log_info "3. Wait for nodes to reboot and become ready (~2-5 minutes)"
|
||||
log_info "4. Bootstrap the cluster:"
|
||||
log_info " talosctl bootstrap --nodes ${CONTROL_PLANE_NODES[0]}"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "=========================================="
|
||||
87
diagnose-storage.sh
Executable file
87
diagnose-storage.sh
Executable file
@ -0,0 +1,87 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# Colors for output
|
||||
GREEN='\033[0;32m'
|
||||
BLUE='\033[0;34m'
|
||||
YELLOW='\033[1;33m'
|
||||
RED='\033[0;31m'
|
||||
NC='\033[0m'
|
||||
|
||||
log_info() {
|
||||
echo -e "${BLUE}[INFO]${NC} $1"
|
||||
}
|
||||
|
||||
log_success() {
|
||||
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||
}
|
||||
|
||||
log_warning() {
|
||||
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||
}
|
||||
|
||||
log_error() {
|
||||
echo -e "${RED}[ERROR]${NC} $1"
|
||||
}
|
||||
|
||||
echo "=========================================="
|
||||
echo "Storage Diagnostics"
|
||||
echo "=========================================="
|
||||
echo ""
|
||||
|
||||
# Check storage classes
|
||||
log_info "Checking available storage classes..."
|
||||
kubectl get storageclass
|
||||
echo ""
|
||||
|
||||
# Check PVCs
|
||||
log_info "Checking PersistentVolumeClaims in gitlab namespace..."
|
||||
kubectl get pvc -n gitlab
|
||||
echo ""
|
||||
|
||||
# Check PVC details
|
||||
log_info "Detailed PVC status..."
|
||||
kubectl describe pvc -n gitlab
|
||||
echo ""
|
||||
|
||||
# Check pods
|
||||
log_info "Checking pods in gitlab namespace..."
|
||||
kubectl get pods -n gitlab
|
||||
echo ""
|
||||
|
||||
# Check pod events
|
||||
log_info "Checking events in gitlab namespace..."
|
||||
kubectl get events -n gitlab --sort-by='.lastTimestamp' | tail -20
|
||||
echo ""
|
||||
|
||||
# Summary and recommendations
|
||||
echo "=========================================="
|
||||
log_info "Summary and Recommendations"
|
||||
echo "=========================================="
|
||||
|
||||
# Check if storage class exists
|
||||
if kubectl get storageclass 2>&1 | grep -q "No resources found"; then
|
||||
log_error "No storage class found!"
|
||||
echo ""
|
||||
log_info "Talos Linux does not include a default storage provisioner."
|
||||
log_info "You need to install one of the following:"
|
||||
echo ""
|
||||
echo " 1. Local Path Provisioner (simple, single-node)"
|
||||
echo " - Best for testing/development"
|
||||
echo " - Uses local node storage"
|
||||
echo ""
|
||||
echo " 2. OpenEBS (distributed, multi-node)"
|
||||
echo " - Production-ready"
|
||||
echo " - Supports replication"
|
||||
echo ""
|
||||
echo " 3. Rook-Ceph (distributed, enterprise)"
|
||||
echo " - Full-featured storage solution"
|
||||
echo " - More complex setup"
|
||||
echo ""
|
||||
log_info "I recommend starting with Local Path Provisioner for simplicity."
|
||||
else
|
||||
log_success "Storage class found"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
76
install-local-path-storage.sh
Executable file
76
install-local-path-storage.sh
Executable file
@ -0,0 +1,76 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# Colors for output
|
||||
GREEN='\033[0;32m'
|
||||
BLUE='\033[0;34m'
|
||||
YELLOW='\033[1;33m'
|
||||
RED='\033[0;31m'
|
||||
NC='\033[0m'
|
||||
|
||||
log_info() {
|
||||
echo -e "${BLUE}[INFO]${NC} $1"
|
||||
}
|
||||
|
||||
log_success() {
|
||||
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||
}
|
||||
|
||||
log_warning() {
|
||||
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||
}
|
||||
|
||||
log_error() {
|
||||
echo -e "${RED}[ERROR]${NC} $1"
|
||||
}
|
||||
|
||||
echo "=========================================="
|
||||
echo "Installing Local Path Provisioner"
|
||||
echo "=========================================="
|
||||
echo ""
|
||||
|
||||
log_info "This will install Rancher's local-path-provisioner for dynamic storage provisioning."
|
||||
echo ""
|
||||
|
||||
# Create namespace for local-path-provisioner
|
||||
log_info "Creating local-path-storage namespace..."
|
||||
kubectl create namespace local-path-storage --dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
# Apply local-path-provisioner
|
||||
log_info "Deploying local-path-provisioner..."
|
||||
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.28/deploy/local-path-storage.yaml
|
||||
|
||||
# Wait for deployment to be ready
|
||||
log_info "Waiting for local-path-provisioner to be ready..."
|
||||
kubectl wait --for=condition=available --timeout=120s deployment/local-path-provisioner -n local-path-storage
|
||||
|
||||
# Check storage class
|
||||
log_info "Checking storage class..."
|
||||
kubectl get storageclass
|
||||
|
||||
# Set as default storage class
|
||||
log_info "Setting local-path as default storage class..."
|
||||
kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
|
||||
|
||||
log_success "Local Path Provisioner installed successfully!"
|
||||
echo ""
|
||||
|
||||
log_info "Storage configuration:"
|
||||
kubectl get storageclass
|
||||
echo ""
|
||||
|
||||
log_info "Provisioner pods:"
|
||||
kubectl get pods -n local-path-storage
|
||||
echo ""
|
||||
|
||||
echo "=========================================="
|
||||
log_success "Installation Complete!"
|
||||
echo "=========================================="
|
||||
echo ""
|
||||
log_info "You can now deploy applications that require persistent storage."
|
||||
log_info "PersistentVolumeClaims will automatically be provisioned on the local node."
|
||||
echo ""
|
||||
log_warning "Note: local-path storage is NOT replicated across nodes."
|
||||
log_warning "For production use with HA requirements, consider OpenEBS or Rook-Ceph."
|
||||
echo ""
|
||||
103
redeploy-gitlab.sh
Executable file
103
redeploy-gitlab.sh
Executable file
@ -0,0 +1,103 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# Colors for output
|
||||
GREEN='\033[0;32m'
|
||||
BLUE='\033[0;34m'
|
||||
YELLOW='\033[1;33m'
|
||||
RED='\033[0;31m'
|
||||
NC='\033[0m'
|
||||
|
||||
log_info() {
|
||||
echo -e "${BLUE}[INFO]${NC} $1"
|
||||
}
|
||||
|
||||
log_success() {
|
||||
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||
}
|
||||
|
||||
log_warning() {
|
||||
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||
}
|
||||
|
||||
log_error() {
|
||||
echo -e "${RED}[ERROR]${NC} $1"
|
||||
}
|
||||
|
||||
echo "=========================================="
|
||||
echo "GitLab Cleanup and Redeployment"
|
||||
echo "=========================================="
|
||||
echo ""
|
||||
|
||||
# Check if storage class exists
|
||||
log_info "Checking for storage class..."
|
||||
if ! kubectl get storageclass local-path &> /dev/null; then
|
||||
log_error "Storage class 'local-path' not found!"
|
||||
log_info "Please run: ./install-local-path-storage.sh first"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
log_success "Storage class 'local-path' found"
|
||||
echo ""
|
||||
|
||||
# Delete existing GitLab deployment
|
||||
log_warning "Cleaning up existing GitLab deployment..."
|
||||
if kubectl get namespace gitlab &> /dev/null; then
|
||||
log_info "Deleting GitLab deployment..."
|
||||
kubectl delete -k testing1/first-cluster/apps/gitlab/ --ignore-not-found=true || true
|
||||
|
||||
log_info "Waiting for pods to terminate..."
|
||||
kubectl wait --for=delete pod --all -n gitlab --timeout=120s 2>/dev/null || true
|
||||
|
||||
log_info "Deleting PVCs (this will delete all data!)..."
|
||||
kubectl delete pvc --all -n gitlab --ignore-not-found=true || true
|
||||
|
||||
log_info "Waiting for PVCs to be deleted..."
|
||||
sleep 5
|
||||
|
||||
log_success "Cleanup complete"
|
||||
else
|
||||
log_info "GitLab namespace doesn't exist - nothing to clean up"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# Deploy GitLab
|
||||
log_info "Deploying GitLab with local-path storage..."
|
||||
kubectl apply -k testing1/first-cluster/apps/gitlab/
|
||||
|
||||
echo ""
|
||||
log_info "Waiting for PVCs to be bound..."
|
||||
sleep 5
|
||||
|
||||
# Check PVC status
|
||||
kubectl get pvc -n gitlab
|
||||
|
||||
echo ""
|
||||
log_info "Waiting for GitLab pod to be created..."
|
||||
sleep 10
|
||||
|
||||
# Show pod status
|
||||
kubectl get pods -n gitlab
|
||||
|
||||
echo ""
|
||||
log_success "GitLab deployment initiated!"
|
||||
echo ""
|
||||
|
||||
log_info "Monitor deployment progress with:"
|
||||
echo " kubectl get pods -n gitlab -w"
|
||||
echo ""
|
||||
|
||||
log_info "GitLab will take 5-10 minutes to fully start up."
|
||||
echo ""
|
||||
|
||||
log_info "Once running, access GitLab at:"
|
||||
echo " http://<node-ip>:30080"
|
||||
echo ""
|
||||
|
||||
log_info "Get the initial root password with:"
|
||||
echo " kubectl exec -n gitlab deployment/gitlab -- grep 'Password:' /etc/gitlab/initial_root_password"
|
||||
echo ""
|
||||
|
||||
echo "=========================================="
|
||||
Loading…
Reference in New Issue
Block a user