月度归档: 2021 年 2 月

解决helm部署报错Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress

在使用helm install 或者helm upgrade的时候,如果出现了异常中断操作, 可能会导致如下报错

Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress

那么问题来了,如何解决这个问题呢?
参考github上的issues:https://github.com/helm/helm/issues/8987,我们可以使用以下操作
1.输入helm history 命令检查当前状态, 如下

$ helm history -n lizhewnei lizhewnei-common
REVISION	UPDATED                 	STATUS         	CHART                 	APP VERSION	DESCRIPTION
331     	Tue Feb 23 23:11:07 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
332     	Wed Feb 24 08:11:08 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
333     	Wed Feb 24 15:11:13 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
334     	Wed Feb 24 23:11:09 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
335     	Thu Feb 25 08:11:09 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
336     	Thu Feb 25 15:11:08 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
337     	Thu Feb 25 23:11:06 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
338     	Fri Feb 26 08:11:13 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
339     	Fri Feb 26 09:49:37 2021	deployed       	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
340     	Fri Feb 26 10:37:53 2021	pending-upgrade	lizhewnei-common-0.1.0	1.16.0     	Preparing upgrade

2.根据上述状态,我们会发现,最近的一次340部署结果是pending-upgrade 所以阻塞了我们的继续部署
3.我们使用helm rollback命令回退一个版本到339版本

$ helm rollback -n lizhewnei lizhewnei-common 339
Rollback was a success! Happy Helming!

4.回退之后,再检查一次当前状态,确认状态信息是回退到339版本

$ helm history -n lizhewnei lizhewnei-common
REVISION	UPDATED                 	STATUS         	CHART                 	APP VERSION	DESCRIPTION
332     	Wed Feb 24 08:11:08 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
333     	Wed Feb 24 15:11:13 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
334     	Wed Feb 24 23:11:09 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
335     	Thu Feb 25 08:11:09 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
336     	Thu Feb 25 15:11:08 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
337     	Thu Feb 25 23:11:06 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
338     	Fri Feb 26 08:11:13 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
339     	Fri Feb 26 09:49:37 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
340     	Fri Feb 26 10:37:53 2021	pending-upgrade	lizhewnei-common-0.1.0	1.16.0     	Preparing upgrade
341     	Fri Feb 26 11:00:23 2021	deployed       	lizhewnei-common-0.1.0	1.16.0     	Rollback to 339

5.这个时候,我们再去使用helm upgrade命令,就可以正常的升级了,升级之后,我们通过helm history 也可以检查到升级成功,

$ helm history -n lizhewnei lizhewnei-common
REVISION	UPDATED                 	STATUS         	CHART                 	APP VERSION	DESCRIPTION
333     	Wed Feb 24 15:11:13 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
334     	Wed Feb 24 23:11:09 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
335     	Thu Feb 25 08:11:09 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
336     	Thu Feb 25 15:11:08 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
337     	Thu Feb 25 23:11:06 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
338     	Fri Feb 26 08:11:13 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
339     	Fri Feb 26 09:49:37 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
340     	Fri Feb 26 10:37:53 2021	pending-upgrade	lizhewnei-common-0.1.0	1.16.0     	Preparing upgrade
341     	Fri Feb 26 11:00:23 2021	superseded     	lizhewnei-common-0.1.0	1.16.0     	Rollback to 339
342     	Fri Feb 26 11:01:27 2021	deployed       	lizhewnei-common-0.1.0	1.16.0     	Upgrade complete
 
 
 

基于istio的灰度发布实验

背景

灰度发布又叫A/B测试,即让一部分用户继续用产品特性A,一部分用户开始用产品特性B,如果用户对B没有什么反对意见,那么逐步扩大范围,把所有用户都迁移到B上面来。
因为最近刚好有灰度发布的需求,我又学了一遍istio,记录了本次灰度发布的实施过程(只包括应用,不包括数据库升级)


实验过程

  1. 先确定目前的应用版本为V1
  2. 通过helm包部署应用版本为V2的pod到K8S集群中
  3. 确定V2版本灰度的用户,方法包括IP,或者特定用户
  4. 通过istio的virtualservice功能把特定用户的流量指向V2版本
  5. 检查特定用户使用一段时间后,是否出现问题
  6. 若无问题,通过istio将所有用户的流量都指向V2版本
  7. 若所有用户都使用V2无问题,删除掉V1版本的pod

示例介绍

前端应用frontend,后端应用mqtt-server,后端应用mqtt-server 通过mqtt协议与设备相连接。
前端部署3个版本,分别是V1,V2,V3,后端同样部署3个版本,也是V1,V2,V3。3个前端版本,按钮文字不一样。3个后端版本,连接的mqtt设备不一样

版本 前端页面 后端返回参数
V1 显示V11按钮
{"message":["wsytest010","wsytest002",
"wsytest003","wsytest007","wsytest006",
"wsytest001","wsytest005","wsytest009",
"wsytest008","wsytest004"]}
V2 显示V22按钮
{"message":["wsytest019","wsytest020",
"wsytest017","wsytest012","wsytest011",
"wsytest014","wsytest018","wsytest015",
"wsytest013","wsytest016"]}
V3 显示V33按钮
{"message":["wsytest024","wsytest028",
"wsytest022","wsytest026","wsytest027",
"wsytest021","wsytest025","wsytest030",
"wsytest023","wsytest029"]}


根据需求,版本不能串,比如前端V1->后端V1,不允许出现前端V1→后端V2这样的情况发生
这里我们在选择分配流量方式时,不能使用权重的方式进行分配,只能选择指定用户或者指定IP,如果选择权重的方式,可能会出现如下的问题:
前端会访问多个js,css等文件,如果使用权重的方式,会出现一部分js来源于v1版本,一部分css来源于v2版本。
后端也同理,如果一个页面打开时,触发多个后端请求,部分来源于V2,部分来源于V1,肯定会导致前端显示出现问题。
所以只有把前后端通过某种方式一一对应,才能正常使用


代码实现与注意事项

1.部署前端的3个应用程序,所有的pod都加上 labels:[app:frontend,version:#{对应版本}]

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
  labels:
    app: frontend
    version: v1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: frontend
      version: v1
  template:
    metadata:
      labels:
        app: frontend
        version: v1
    spec:
      containers:
      - name: frontend
        image: 前端镜像:v1
        securityContext:
          capabilities:
            add: ["NET_ADMIN", "NET_RAW"]    # 按照istio的说明,最好把这个pod安全策略加上
        imagePullPolicy: Always
        ports:
        - containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend-v2
  labels:
    app: frontend
    version: v2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: frontend
      version: v2
  template:
    metadata:
      labels:
        app: frontend
        version: v2
    spec:
      containers:
      - name: frontend
        image: 前端镜像:v2
        securityContext:
          capabilities:
            add: ["NET_ADMIN", "NET_RAW"]
        imagePullPolicy: Always
        ports:
        - containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend-v3
  labels:
    app: frontend
    version: v3
spec:
  replicas: 1
  selector:
    matchLabels:
      app: frontend
      version: v3
  template:
    metadata:
      labels:
        app: frontend
        version: v3
    spec:
      containers:
      - name: frontend
        image: 前端镜像:v3
        securityContext:
          capabilities:
            add: ["NET_ADMIN", "NET_RAW"]
        imagePullPolicy: Always
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: frontend
spec:
  selector:
    app: frontend
  type: ClusterIP   #这个不用NodePort,因为流量如果是从NodePort进来的,就控不住的
  ports:
    - port: 80
      targetPort: 80
      name: http-web

2.部署后端应用程序,与前端应用类似

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mqtt-server-v1
  labels:
    app: mqtt-server
    version: v1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mqtt-server
      version: v1
  template:
    metadata:
      labels:
        app: mqtt-server
        version: v1
    spec:
      serviceAccountName: mqtt-server
      containers:
      - name: mqtt-server
        image: 后端镜像:latest
        securityContext:
          capabilities:
            add: ["NET_ADMIN", "NET_RAW"]    # 按照istio的说明,最好把这个pod安全策略加上
        imagePullPolicy: Always
        ports:
        - containerPort: 8000
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mqtt-server-v2
  labels:
    app: mqtt-server
    version: v2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mqtt-server
      version: v2
  template:
    metadata:
      labels:
        app: mqtt-server
        version: v2
    spec:
      serviceAccountName: mqtt-server
      containers:
      - name: mqtt-server
        image: 后端镜像:latest
        securityContext:
          capabilities:
            add: ["NET_ADMIN", "NET_RAW"]
        imagePullPolicy: Always
        ports:
        - containerPort: 8000
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mqtt-server-v3
  labels:
    app: mqtt-server
    version: v3
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mqtt-server
      version: v3
  template:
    metadata:
      labels:
        app: mqtt-server
        version: v3
    spec:
      serviceAccountName: mqtt-server
      containers:
      - name: mqtt-server
        image: 后端镜像:latest
        securityContext:
          capabilities:
            add: ["NET_ADMIN", "NET_RAW"]
        imagePullPolicy: Always
        ports:
        - containerPort: 8000
---
apiVersion: v1
kind: Service
metadata:
  name: mqtt-server
spec:
  selector:
    app: mqtt-server
  type: NodePort   #这个不用NodePort,因为流量如果是从NodePort进来的,就控不住的
  ports:
    - port: 8000
      targetPort: 8000
      name: http-web

3.区分外部流量和内部流量。我们将浏览器到前端的称为外部流量,K8S里的例如前端到后端的称为内部流量

4.外部流量出去,需要被istio的ingress gateway管控起来,所以需要配置一个gateway

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: bookinfo
spec:
  selector:
    istio: ingressgateway # use istio default controller
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "*"

5.配置后端的virtualservice和destination,确保后端程序能与前端程序产生一对一的关系,在无对应关系时,默认使用V1版本

---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: mqtt-server-internal
spec:
  hosts:
  - "mqtt-server"     #此处是关键,把匹配到该url的流量,全部走到这个特定的virtualservice里
  http:
  - match:
    - sourceLabels:
        version: v1
    route:
    - destination:
        host: mqtt-server
        subset: v1             # 将匹配到的流量,转向subset的v1版本,这个subset: v1在destination.yaml里定义
      headers:
        response:
          add:
            user: v1
  - match:
    - sourceLabels:
        version: v2
    route:
    - destination:
        host: mqtt-server
        subset: v2
      headers:
        response:
          add:
            user: v2
  - match:
    - sourceLabels:
        version: v3
    route:
    - destination:
        host: mqtt-server
        subset: v3
      headers:
        response:
          add:
            user: v3
  - route:
    - destination:
        host: mqtt-server
        subset: v1
      headers:
        response:
          add:
            user: v1
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: mqtt-server
spec:
  host: mqtt-server.default.svc.cluster.local
  subsets:
  - name: v1
    labels:
      version: v1    # 根据pod的 version: v1 的label来进行匹配
  - name: v2
    labels:
      version: v2
  - name: v3
    labels:
      version: v3

6.配置前端的virtualservice和destination,我们可以设置来源于192.168.0.58这个IP的走V2版本,其余IP走V1版本

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: frontend-server
spec:
  hosts:
  - "外网域名"     #此处是关键,把匹配到该url的流量,全部走到这个特定的virtualservice里
  gateways:
  - bookinfo-gateway              #此处必须对应上gateway的名字
  http:
  - match:
    - headers:
      X-Forwarded-For:
          exact: "192.168.0.58"            #此处表示匹配header里有{"user":"v1"}
    route:
    - destination:
        host: mqtt-server
        subset: v2             # 将匹配到的流量,转向subset的v1版本,这个subset: v1在destination.yaml里定义
      headers:
        response:
          add:
            user: v2
  - route:
    - destination:
        host: frontend
        subset: v1             # 将匹配到的流量,转向subset的v1版本,这个subset: v1在destination.yaml里定义
      headers:
        response:
          add:
            user: v1
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: frontend
spec:
  host: frontend
  subsets:
  - name: v1
    labels:
      version: v1    # 根据pod的 version: v1 的label来进行匹配
  - name: v2
    labels:
      version: v2
  - name: v3
    labels:
      version: v3

7.因为我们的浏览器访问的时候,会经过istio,所以前端收到的IP并不是真是的IP,我们需要修改istio的ingress文件,把spec.externalTrafficPolicy设置成Local,如下图所示

8.最终情况


实验效果图

1.当本机IP地址不符合条件时,前端和后端都是V1版本的结果,第一张图是实际效果,第二张图是kiali显示的流量图
2.当本机IP符合条件时,前端和后端都是V2版本的结果,左图是实际效果,右图是kiali显示的流量图

3.当同时有满足IP和不满足IP条件的机器访问时,流量图效果如下

从零开始安装istio与skywalking

版本

istio 安装1.8.2版本
skywalking安装8.1.0版本
K8S集群使用rancher安装1.19版本


istio安装

1.下载istio包到k8s的任意一台master机器上

curl -L https://istio.io/downloadIstio | sh -

2. 进入istio目录,设置环境变量,后续我们的istio安装,都在该目录下进行操作

cd istio-1.8.2
export PATH=$PWD/bin:$PATH

3.安装istio,同时设置skywalking-oap地址

输入如下命令
istioctl install \
  --set profile=demo \
  --set meshConfig.enableEnvoyAccessLogService=true \
  --set meshConfig.defaultConfig.envoyAccessLogService.address=skywalking-oap.istio-system:11800 等待出现如下回显即可完成 ✔ Istio core installed
✔ Istiod installed
✔ Egress gateways installed
✔ Ingress gateways installed
✔ Installation complete

4.安装kiali。安装成功后,记得把kiali通过ingress暴露出来,我是使用的traefik来暴露的

kubectl apply -f samples/addons
kubectl rollout status deployment/kiali -n istio-system

skywalking安装

git clone https://github.com/apache/skywalking-kubernetes.git
cd skywalking-kubernetes/chart
helm repo add elastic https://helm.elastic.co
helm dep up skywalking
helm install 8.1.0 skywalking -n istio-system \
  --set oap.env.SW_ENVOY_METRIC_ALS_HTTP_ANALYSIS=k8s-mesh \
  --set fullnameOverride=skywalking \
  --set oap.envoy.als.enabled=true \
  --set ui.image.tag=8.1.0 \
  --set oap.image.tag=8.1.0-es6 \
  --set oap.storageType=elasticsearch

上述命令会安装一个skywalking和一个elasticsearch 6.8.6版本
安装完后,暴露skywalking-ui到外部,让用户可以通过页面访问


安装测试程序

通过以下命令进行安装,安装完成后,可以自己设置一下,通过loadalance暴露到外网访问,loadbalance的IP可以通过metallb搞一个

kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
kubectl apply -f samples/bookinfo/networking/bookinfo-gateway.yaml
kubectl apply -f samples/bookinfo/networking/destination-rule-all.yaml

部署效果

浏览器进入bookinfo测试程序

进入kiali检查出现的流量分布情况

skywalking的界面


苏ICP备18047533号-1