CKA Exam - Troubleshooting 30%

        CKA Exam - Troubleshooting 30%

 

  1. Evaluate cluster and node logging.

  2. Understand how to monitor applications.

  3. Manage container stdout & stderr logs.

  4. Troubleshoot application failure.

  5. Troubleshoot cluster component failure.

  6. Troubleshoot networking.

     

    In this page, I will try to demonstrate the questions as per the Linux Foundation pattern. 

    Please note that these questions may or may not ask in the exam but you will get the fair idea and get the confidence to clear the exam.
     

    1.  Evaluate cluster and node logging.

    Question: Settings Configuration Environment Kubectl Config Use-Context K8s
    Check how many Nodes are ready (excluding nodes that are set on taint: Noschedule), and write the number to /var/log/k8s00402.txt
     

    Solution:

     [root@master1 ~]#  Kubectl Config Use-Context K8s

     First, check the number of nodes, we have.
    [root@master1 ~]#  kubectl get nodes
    NAME                      STATUS   ROLES           AGE    VERSION
    master1.example.com       Ready    control-plane   380d   v1.26.9
    workernode1.example.com   Ready    <none>          380d   v1.26.9
    workernode2.example.com   Ready    <none>          380d   v1.26.9


    From the above output, we get to know that there are 3 nodes in our cluster, which are in ready state. Let's check the taints on all these 3 nodes.
     

    Syntax of command to check the Taint on nodes.

     
    kubectl describe nodes node_name | grep -i taint

    [root@master1 ~]#  kubectl describe nodes master1.example.com | grep -i taint
    Taints:             node-role.kubernetes.io/control-plane:NoSchedule

    [root@master1 ~]#  kubectl describe nodes workernode1.example.com | grep -i taint
    Taints:             <none>

    [root@master1 ~]#  kubectl describe nodes workernode2.example.com | grep -i taint
    Taints:             <none>

    From the above outputs, we get to know that we have 2 nodes which are not set taint "NoSchedule". Thus, we can write "2" in the text file.

    [root@master1 ~]#  echo "2" > /var/log/k8s00402.txt

    It's always a best practice to do the post checks. Hence, cat this file.

    [root@master1 ~]#  cat /var/log/k8s00402.txt
    2

    Question completed successfully.



    2.  Understand how to monitor applications.

    One of the best way to enable the application monitoring through Prometheous and Graphana. However, we can also implement the sidecar pod. Prometheous and Graphana are not the part of Kubernetes component. Thus, sidecar question will have high possibililty.

    Question : Add a busybox sidecar container to the existing Pod customer-red-app. The new sidecar container has to run the following command: /bin/sh -c tail -n+1 -f /var/log/customer-red-app.log
    Use a volume mount named logs to make the file /var/log/customer-red-app.log available to the sidecar container.

        Don’t modify the existing container.
        Don’t modify the path of the log file, both containers must access it at /var/log/customer-red-app.log.

    config use : k8s-c1-H

    Solution: You need to add one more container in the existing pod.
    - name of container is not given. Thus, you can chose whatever you want.
    - This new container must write a logs under "/var/log/customer-red-app.log"

    First check the pods where it is located.

    [root@master1 ~]#  kubectl config use-context k8s-c1-H

    [root@master1 ~]# kubectl get pods
    NAME      READY   STATUS    RESTARTS   AGE
    podname   1/1     Running   0          11s

    Get all the content in one yaml file.

    [root@master1 ~]#   kubectl get pods/podname -o yaml > podsname.yaml

    Copy this file as a backup. Because we will delete this pod and modify the yaml file. 

     [root@master1 ~]#  cp podsname.yaml podsname.yaml.back

    Delete the running pod.

    [root@master1 ~]#  kubectl delete pods/podname
    pod "podname" deleted


    Open the "Kubernetes.io" page and Click on "Documentation". On the left hand side, search "sidecar". Open the first link and then scroll down and find the below yaml file. Copy these 6 lines and then update as per question.


     Please take care of indentation. It is very important. Now, open the yaml file that we created earlier and modify it. You need need to add 6 lines, see the below blue color lines.

     [root@master1 ~]# vi podsname.yaml

     apiVersion: v1
    kind: Pod
    metadata:
      annotations:
        cni.projectcalico.org/containerID: 56bdc95fc52ba447c6fd2d51643b5ed883864b8405737e615e2a144fd33d2bba
        cni.projectcalico.org/podIP: 172.16.14.104/32
        cni.projectcalico.org/podIPs: 172.16.14.104/32
        kubectl.kubernetes.io/last-applied-configuration: |
          {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"podname","namespace":"default"},"spec":{"containers":[{"args":["/bin/sh","-c","i=0; while true; do\n  echo \"$(date) INFO $i\" \u003e\u003e /var/log/customer-red-app.log;\n  i=$((i+1));\n  sleep 1;\ndone\n"],"image":"busybox","name":"count","volumeMounts":[{"mountPath":"/var/log","name":"logs"}]}],"volumes":[{"emptyDir":{},"name":"logs"}]}}
      creationTimestamp: "2023-12-23T11:43:54Z"
      name: podname
      namespace: default
      resourceVersion: "852677"
      uid: a8fd3950-df74-444f-ab51-6a023c797f35
    spec:
      containers:
      - name: sidecarbusybox
        image: busybox
        args: [/bin/sh, -c, 'tail -n+1 -f /var/log/customer-red-app.log']
        volumeMounts:
        - mountPath: /var/log
          name: logs

      - args:
        - /bin/sh
        - -c
        - |
          i=0; while true; do
            echo "$(date) INFO $i" >> /var/log/customer-red-app.log;
            i=$((i+1));
            sleep 1;
          done
        image: busybox
        imagePullPolicy: Always
        name: count
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/log
          name: logs

        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: kube-api-access-sfk7w
          readOnly: true
      dnsPolicy: ClusterFirst

     

    [root@master1 ~]#   kubectl apply -f podsname.yaml

    pod/podname created

    [root@master1 ~]# kubectl get pods
    NAME      READY   STATUS    RESTARTS     AGE
    podname   2/2     Running   1 (9s ago)   12s

    One can observe that now, we have 2 containers are running.

    How to verify it ?

    Syntax of command :

    kubectl logs pods/POD_NAME -c Container_NAME


    [root@master1 ~]# kubectl logs  pods/podname -c sidecarbusybox | head
    Sat Dec 23 11:57:02 UTC 2023 INFO 0
    Sat Dec 23 11:57:03 UTC 2023 INFO 1
    Sat Dec 23 11:57:04 UTC 2023 INFO 2
    Sat Dec 23 11:57:05 UTC 2023 INFO 3
    Sat Dec 23 11:57:06 UTC 2023 INFO 4
    Sat Dec 23 11:57:07 UTC 2023 INFO 5
    Sat Dec 23 11:57:08 UTC 2023 INFO 6
    Sat Dec 23 11:57:09 UTC 2023 INFO 7
    Sat Dec 23 11:57:10 UTC 2023 INFO 8
    Sat Dec 23 11:57:11 UTC 2023 INFO 9



    3. Manage container stdout & stderr logs.

    Question 3 : Monitor the logs of pod tata-bar and extract log lines corresponding to error "website is down- unable to access it". Write them to  /opt/KUTR00101/tata-bar



    Solution: There is a pod running and you need to extract the string and save it on one file.

    How to see the logs of pods?

    kubectl logs tata-bar

    If we want to send the logs to another file then use ">" append symbol.



    kubectl logs tata-bar > /opt/KUTR00101/tata-bar

     

    _______________________________________________________________

    If you want to practice at home, and wondering how to create this pod. Then here is the solution.

    Go to below URL or edit the yaml file like below:

    "https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/"

    cat <<EOF>> tata-bar.yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: tata-bar
      labels:
        purpose: demonstrate-command
    spec:
      containers:
      - name: command-demo-container
        image: debian
        command: ["/bin/sh"]
        args: ["-c", "while true; do echo website is down- unable to access it; sleep 10;done"]
      restartPolicy: OnFailure
    EOF


    kubectl create -f tata-bar.yaml

     ______________________________________________________________ 


    Next question on this would be.....

    From the pod label name=app-nginx, find pods running high CPU workloads and write the name of the pod consuming most CPU to the file /var/log/KUT00401.txt (which already exists).

    Use context: kubectl config use-context k8s-c1-H

    Solution:

    [root@master1 ~]# kubectl config use-context k8s-c1-H

    [root@master1 ~]# kubectl get pods -l name=app-nginx
    NAME       READY   STATUS    RESTARTS   AGE
    cpu-pod1   1/1     Running   0          68s
    cpu-pod2   1/1     Running   0          20s
    max-pod1   1/1     Running   0          7s

    Now, use the "top" sub command to check the high CPU utilization pod name.

    [root@master1 ~]# kubectl top pods -l name=app-nginx
    NAME       CPU(cores)   MEMORY(bytes)   
    cpu-pod1   100m           2Mi             
    cpu-pod2   30m           2Mi             
    max-pod1   10m           2Mi      

    Put the higest CPU ulitzation pod name in the give file. 

    [root@master1 ~]# echo "cpu-pod1" > /var/log/KUT00401.txt
    [root@master1 ~]# cat /var/log/KUT00401.txt
    cpu-pod1
    [root@master1 ~]#


    4. Troubleshoot application failure.

    Question 2:

    Use context: kubectl config use-context ek8s

    Set the node named workernode1.example.com as unavailable and reschedule all the pods running on it.

    Solution: In this question, it is asked indirectly to put this node on maintenance mode.

    Change the context

    kubectl config use-context ek8s
    

    Check the context

    kubectl config current-context
    

    Change the running pods on all nodes. Just check the workernode1, how many pods are working.

    kubectl get pods -o wide
    

    Check the node name.

    kubectl get nodes
    

    Put the node on maintenance mode, cordon is the sub command and the followed by nodename.

    kubectl cordon workernode1.example.com
    

    Move all the pods to another nodes.

    kubectl drain workernode1.example.com --ignore-daemonsets --force
    

    Check all pods moved from workernode1 to another node. In exam node name must be changed.

    kubectl get pods -o wide
     

    5. Troubleshoot cluster component failure.

    Use context: kubectl config use-context k8s-c2-AC

    Use context: "Kubectl Config Use-Context Ek8s"
    Kubernetes worker node named workernode2.example.com is in a NotReady state. Investigate the root cause and resolve it. Ensure that any changes made are permanently effective.

    - You can use the following command to connect to the fault node:
            ssh worker2.example.com
    - You can use the following command to get higher permissions on this node:
            sudo -i


    Solution:

    Use the correct context.

    [root@master1 ~]# Kubectl Config Use-Context Ek8s

    Check the node status.

    [root@master1 ~]# kubectl get nodes
    NAME                      STATUS     ROLES           AGE    VERSION
    master1.example.com       Ready      control-plane   381d   v1.26.9
    workernode1.example.com   Ready      <none>          380d   v1.26.9
    workernode2.example.com   NotReady   <none>          381d   v1.26.9

    [root@master1 ~]#

    Loging to workernode2 and use the command which is given in the question.

    [root@master1 ~]# ssh workernode2

    Rais the privileges by executing the command "sudo -i". This command will also be provided in the question.

    [arana@workernode2 ~]$ sudo -i
    [root@workernode2 ~]#

    KUBELET is a agent and it is responsible to connect with master node. Thus, check the service of kubelet.


    [root@workernode2 ~]# systemctl status kubelet

    ○ kubelet.service - kubelet: The Kubernetes Node Agent
         Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
        Drop-In: /usr/lib/systemd/system/kubelet.service.d
                 └─10-kubeadm.conf
         Active: inactive (dead) since Sat 2023-12-23 22:24:36 IST; 2min 56s ago
       Duration: 5h 49min 33.339s
           Docs: https://kubernetes.io/docs/
        Process: 770 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=0/SUCCESS)
       Main PID: 770 (code=exited, status=0/SUCCESS)
            CPU: 2min 15.540s
    Dec 23 22:23:07 workernode2.example.com kubelet[770]: I1223 22:23:07.283124     770 log.go:194] http: TLS handshake error from 192.168.1.32:60088: remote error: tls: bad certificate
    Dec 23 22:23:22 workernode2.example.com kubelet[770]: I1223 22:23:22.268391     770 log.go:194] http: TLS handshake error from 192.168.1.32:4435: remote error: tls: bad certificate
    Dec 23 22:23:37 workernode2.example.com kubelet[770]: I1223 22:23:37.280033     770 log.go:194] http: TLS handshake error from 192.168.1.32:61478: remote error: tls: bad certificate
    Dec 23 22:23:52 workernode2.example.com kubelet[770]: I1223 22:23:52.271086     770 log.go:194] http: TLS handshake error from 192.168.1.32:8049: remote error: tls: bad certificate
    Dec 23 22:24:07 workernode2.example.com kubelet[770]: I1223 22:24:07.285566     770 log.go:194] http: TLS handshake error from 192.168.1.32:53329: remote error: tls: bad certificate
    Dec 23 22:24:22 workernode2.example.com kubelet[770]: I1223 22:24:22.264235     770 log.go:194] http: TLS handshake error from 192.168.1.32:51315: remote error: tls: bad certificate
    Dec 23 22:24:36 workernode2.example.com systemd[1]: Stopping kubelet: The Kubernetes Node Agent...
    Dec 23 22:24:36 workernode2.example.com systemd[1]: kubelet.service: Deactivated successfully.
    Dec 23 22:24:36 workernode2.example.com systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
    Dec 23 22:24:37 workernode2.example.com systemd[1]: kubelet.service: Consumed 2min 15.540s CPU time.


    From the above command output, it is clear that kubelet service is not running. Next, to start the service.

    [root@workernode2 ~]# systemctl start kubelet

    Check the kubelet service

    [root@workernode2 ~]# systemctl status kubelet
    ● kubelet.service - kubelet: The Kubernetes Node Agent
         Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
        Drop-In: /usr/lib/systemd/system/kubelet.service.d
                 └─10-kubeadm.conf
         Active:
    active (running) since Sat 2023-12-23 22:31:28 IST; 4s ago
           Docs: https://kubernetes.io/docs/
       Main PID: 168384 (kubelet)
          Tasks: 10 (limit: 13824)
         Memory: 34.8M
            CPU: 250ms
         CGroup: /system.slice/kubelet.service
                 └─168384 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runt>

    Dec 23 22:31:31 workernode2.example.com kubelet[168384]: I1223 22:31:31.096918  168384 kubelet_node_status.go:73] "Successfully registered node" node="workernode2.example.com"
    Dec 23 22:31:31 workernode2.example.com kubelet[168384]: I1223 22:31:31.147476  168384 kubelet_node_status.go:493] "Fast updating node status as it just became ready"
    Dec 23 22:31:31 workernode2.example.com kubelet[168384]: I1223 22:31:31.215099  168384 topology_manager.go:210] "Topology Admit Handler" podUID=c6121c46-99de-4b99-b7ce-97ceb0add518 podNamespace="ku>
    Dec 23 22:31:31 workernode2.example.com kubelet[168384]: E1223 22:31:31.215153  168384 cpu_manager.go:395] "RemoveStaleState: removing container" podUID="a1d210b0-1e23-46dc-bbcf-a497d8440a48" conta>
    Dec 23 22:31:31 workernode2.example.com kubelet[168384]: I1223 22:31:31.215178  168384 memory_manager.go:346] "RemoveStaleState removing state" podUID="a1d210b0-1e23-46dc-bbcf-a497d8440a48" contain>
    Dec 23 22:31:31 workernode2.example.com kubelet[168384]: I1223 22:31:31.345454  168384 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"tmp-dir\" (Un>
    Dec 23 22:31:31 workernode2.example.com kubelet[168384]: I1223 22:31:31.345550  168384 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-acce>
    Dec 23 22:31:31 workernode2.example.com kubelet[168384]: E1223 22:31:31.451919  168384 projected.go:292] Couldn't get configMap default/kube-root-ca.crt: object "default"/"kube-root-ca.crt" not reg>
    Dec 23 22:31:31 workernode2.example.com kubelet[168384]: E1223 22:31:31.451952  168384 projected.go:198] Error preparing data for projected volume kube-api-access-sfk7w for pod default/podname: obj>
    Dec 23 22:31:31 workernode2.example.com kubelet[168384]: E1223 22:31:31.452004  168384 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/projected/0b06b8d2-2078-4929-8eb9-b2e>

    In the question, it is asked us to make the changes parmanet. Thus, use the enable sub command to make the changes parmanet.
    [root@workernode2 ~]# systemctl enable kubelet


    [root@workernode2 ~]# exit
    logout
    [arana@workernode2 ~]$ exit
    logout
    Connection to workernode2 closed.

    Exist from the workernode2 and check the the status of our node.


    [root@master1 ~]# kubectl get nodes
    NAME                      STATUS   ROLES           AGE    VERSION
    master1.example.com       Ready    control-plane   381d   v1.26.9
    workernode1.example.com   Ready    <none>          381d   v1.26.9
    workernode2.example.com   Ready    <none>          381d   v1.26.9
    [root@master1 ~]# 

    Question END.


    6. Troubleshoot networking.

     

    Qestion 4: Reconfigure the existing deployment front-end-var and add a port specification named http exposing port 80/tcp of the existing container nginx.

    Create a new service named "front-end-var-svc-var" exposing the container port http.

    Configure the new service to also expose individual Pods via a NodePort on the nodes on which they are scheduled.

    Solution:  

    What we have, deployment name "front-end-var", which is already running. What is asked us, expose this deployment to new service "front-end-app-svc" on NodePort service on port 80.

    kubectl expose deployment front-end-var --name=front-end-var-svc-var --port=80 --target-port=80 --protocol=TCP --type=NodePort
    
    [root@master1 ~]# kubectl get service front-end-var-svc-var 
    NAME            TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
    front-end-var-svc-var   NodePort   10.101.245.122   <none>        80:30521/TCP   12s
    [root@master1 ~]# curl http://10.101.245.122:80 | head
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100   615  100   615    0     0   600k      0 --:--:-- --:--:-- --:--:--  600k
    <!DOCTYPE html>
    <html>
    <head>
    <title>Welcome to nginx!</title>
    <style>
    html { color-scheme: light dark; }
    body { width: 35em; margin: 0 auto;
    font-family: Tahoma, Verdana, Arial, sans-serif; }
    </style>
    </head>
    [root@master1 ~]#
    

    If you want to understand in deep then you may watch my "What is service in Kubernetes" video on Youtube.

     

    What is service in Kubernete

     

Post a Comment

0 Comments