The metadata is pretty minimal, with just a name (labels are not required):
apiVersion: v1 kind: ReplicationController metadata: name: cassandra # The labels will be applied automatically # from the labels in the pod template, if not set # labels: # app: Cassandra
The spec specifies the number of replicas:
spec: replicas: 3 # The selector will be applied automatically # from the labels in the pod template, if not set. # selector: # app: Cassandra
The pod template's metadata is where the app: Cassandra label is specified. The replication controller will keep track and make sure that there are exactly three pods with that label:
template: metadata: labels: app: Cassandra
The pod template's spec describes the list of containers. In this case, there is just one container. It uses the same Cassandra Docker image named cassandra and runs the run.sh script:
spec: containers: - command: - /run.sh image: gcr.io/google-samples/cassandra:v11 name: cassandra
The resources section just requires 0.5 units of CPU in this example:
resources: limits: cpu: 0.5
The environment section is a little different. The CASSANDRA_SEED_PROVDIER specifies the custom Kubernetes seed provider class we examined earlier. Another new addition here is POD_NAMESPACE, which uses the Downward API again to fetch the value from the metadata:
env: - name: MAX_HEAP_SIZE value: 512M - name: HEAP_NEWSIZE value: 100M - name: CASSANDRA_SEED_PROVIDER value: "io.k8s.cassandra.KubernetesSeedProvider" - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: POD_IP valueFrom: fieldRef: fieldPath: status.podIP
The ports section is identical, exposing the intra-node communication ports (7000 and 7001), the 7199 JMX port used by external tools, such as Cassandra OpsCenter, to communicate with the Cassandra cluster, and of course the 9042 CQL port, through which clients communicate with the cluster:
ports: - containerPort: 7000 name: intra-node - containerPort: 7001 name: tls-intra-node - containerPort: 7199 name: jmx - containerPort: 9042 name: cql
Once again, the volume is mounted into /cassandra_data. This is important because the same Cassandra image configured properly just expects its data directory to be at a certain path. Cassandra doesn't care about the backing storage (although you should care, as the cluster administrator). Cassandra will just read and write using filesystem calls:
volumeMounts: - mountPath: /cassandra_data name: data
The volumes section is the biggest difference from the stateful set solution. A stateful set uses persistent storage claims to connect a particular pod with a stable identity to a particular persistent volume. The replication controller solution just uses an emptyDir on the hosting node:
volumes: - name: data emptyDir: {}
This has many ramifications. You have to provision enough storage on each node. If a Cassandra pod dies, its storage goes away. Even if the pod is restarted on the same physical (or virtual) machine, the data on disk will be lost because emptyDir is deleted once its pod is removed. Note that container restarts are OK because emptyDir survives container crashes. So, what happens when the pod dies? The replication controller will start a new pod with empty data. Cassandra will detect that a new node was added to the cluster, assign it some portion of the data, and start rebalancing automatically by moving data from other nodes. This is where Cassandra shines. It constantly compacts, rebalances, and distributes the data evenly across the cluster. It will just figure out what to do on your behalf.