engineering article image
engineering

Getting started with Redpanda

Use rpk to bootstrap the config, start nodes and manage topics.

By David Castillo on October 07, 2020

Time to read icon30 min read

Intro

In this post I’m going to introduce rpk, a single tool for managing your entire Redpanda cluster. It handles everything from low-level tuning to node configuration, and Kafka® level management tasks like topic creation.

Prerequisites

For the purpose of this guide, we’ll assume a fresh single node. If you have a node or a cluster running already, you can still follow this guide, but you might get different outputs from some of the commands.

If you run into any issues, please let us know in our community Slack workspace!

Installing Redpanda

Enter your email on the signup form in https://vectorized.io/ and run the generated command. It should be something similar to

$ curl -s https://<url to setup script> | sudo bash && sudo yum install redpanda -y

You can check that it was installed correctly by running

$ rpk version

Bootstrapping the configuration

Redpanda comes with a default configuration for a single-node “cluster” that works out of the box. However, there are times when you need to set certain fields of the node’s configuration, like the node’s IP address and its ID. Fortunately, there’s an rpk command for that!

$ sudo rpk redpanda config bootstrap --id 0

Note: The DEB and RPM packages install the config file in /etc/redpanda/redpanda.yaml by default, so that’s why these commands need to be run as root.

rpk redpanda config bootstrap will set the node ID to the given ID and try to discover your machine’s private IPv4 address to set the config accordingly.

If you need to set other fields, you can do so with rpk redpanda config set. Also make sure to check out our advanced config reference for a complete list of configuration fields.

Starting redpanda

The Redpanda packages come with custom systemd units to ensure isolation and resilience, also providing a simple way to run Redpanda. So if you are farmiliar with those standard system tools you’ll feel right at home. Let’s walk through it anyway.

$ sudo systemctl start redpanda

You can verify that it started by running

$ systemctl status redpanda

If everything went well, the output you see should be something like this:

● redpanda.service - Redpanda, the fastest queue in the West.
Loaded: loaded (/usr/lib/systemd/system/redpanda.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2020-10-12 08:56:39 -05; 2min 8s ago
Main PID: 21188 (redpanda)
Status: "redpanda is ready! - release-0.99.7-178-g57e2946c - 57e2946c53f4d89e8446191b98aae1e969727090-dirty"
Tasks: 32 (limit: 38107)
Memory: 505.9M
CGroup: /redpanda.slice/redpanda.service
└─21188 /opt/redpanda/bin/redpanda --redpanda-cfg /etc/redpanda/redpanda.yaml --lock-memory false --io-properties-file /etc/redpanda/io-config.yaml
Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,414 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:565 - Recovered, log offsets: {start_offset:>
Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,425 [shard 0] storage - segment.cc:522 - Creating new segment /var/lib/redpanda/data/redpanda/kvstore/0_0/0-0-v1.log
Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,439 [shard 0] cluster - members_manager.cc:46 - starting cluster::members_manager...
Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,450 [shard 0] raft - [group_id:0, {redpanda/controller/0}] vote_stm.cc:238 - became the leader term:{1}
Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,450 [shard 0] storage - segment.cc:522 - Creating new segment /var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log
Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,450 [shard 0] cluster - state_machine.cc:18 - Starting state machine
Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,451 [shard 0] redpanda::main - application.cc:448 - Started RPC server listening at {host: 0.0.0.0, port: 33145}
Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,451 [shard 0] redpanda::main - application.cc:487 - Started Kafka API server listening at {host: 0.0.0.0, port: 9092}
Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO 2020-10-12 08:56:39,451 [shard 0] redpanda::main - application.cc:489 - Successfully started Redpanda!
Oct 12 08:56:39 localhost.localdomain systemd[1]: Started Redpanda, the fastest queue in the West..

Using systemd is especially useful when running Redpanda in production, since it’s a central place to enforce restart policies, resource isolation, and running periodic jobs.

rpk also provides a debug info command which gives you much more information about the current node, such as resource usage and the current configuration (your output might differ):

$ rpk debug info
Version v21.1.2 (rev 500bf8fa)
Cloud Provider aws
Machine Type i3.large
OS x86_64 5.4.0-1025-aws #25-Ubuntu SMP Fri Sep 11 09:37:24 UTC 2020 Ubuntu 20.04.1
LTS
CPU Model Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
CPU Usage % 23.993
Free Memory (MB) 13271.105
Free Space (MB) 449520.945
cluster_id test
config_file /etc/redpanda/redpanda.yaml
license_key eyJvIjoidmVjdG9yaXplZC5pbyIsInkiOjIwMjAsIm0iOjEwLCJkIjoxMiwiYyI6MzYyMjkwOTA3NH0=
node_uuid RKdMw6hLru7NxbMHkoFcUkvY42No26xKFDE92bx4so2h2DJtQ
organization vectorized.io
redpanda.admin 172.31.20.28:9644
redpanda.auto_create_topics_enabled true
redpanda.data_directory /var/lib/redpanda/data
redpanda.kafka_api 172.31.20.28:9092
redpanda.kafka_api_tls.cert_file
redpanda.kafka_api_tls.enabled false
redpanda.kafka_api_tls.key_file
redpanda.kafka_api_tls.truststore_file
redpanda.node_id 0
redpanda.rpc_server 172.31.20.28:33145
rpk.coredump_dir /var/lib/redpanda/coredump
rpk.enable_memory_locking false
rpk.enable_usage_stats true
rpk.tls.cert_file
rpk.tls.key_file
rpk.tls.truststore_file
rpk.tune_aio_events true
rpk.tune_clocksource true
rpk.tune_coredump false
rpk.tune_cpu true
rpk.tune_disk_irq true
rpk.tune_disk_nomerges true
rpk.tune_disk_scheduler true
rpk.tune_fstrim true
rpk.tune_network true
rpk.tune_swappiness true
rpk.tune_transparent_hugepages false

Note: If you’ve read our docs guide, you might be wondering why we didn’t run rpk redpanda tune. The answer is that rpk redpanda tune makes changes to your machine’s configuration to ensure the best performance possible, and some of them are persistent. Since they might affect your experience on desktop apps, it’s better not to tune your development machine.

Introducing rpk topic

As part of Redpanda - and Vectorized -, our goal with rpk is for it to have everything you need to do your job. No external tools, no long setup or configuration steps. If there’s a common use-case, rpk should support it.

That’s why we added the rpk topic command namespace, which allows you to interact with Redpanda’s Kafka®-compatible API without installing anything else, or using task-specific shell scripts.

All of the rpk topic subcommands default to using the IP configured with rpk redpanda config bootstrap, or localhost:9092 if a configuration file isn’t found. If you need to override that (e.g. to interact with other remote brokers), you can pass a list of broker addresses (<ip>:<port> pairs) through the --brokers flag.

Managing topics

Creating a topic is as simple as running

$ rpk topic create cute-pandas --replicas 1
Created topic 'cute-pandas'. Partitions: 1, replicas: 1, configuration:
'cleanup.policy':'delete'

Because we only have one node, the default number of partitions and a replication factor of 1 is ok, but for a production-ready configuration, you’ll probably want to tune them to your needs with the -p & -r flags.

Let’s check our new topic’s configuration:

$ rpk topic describe cute-pandas
Name cute-pandas
Internal false
Config:
Name Value Read-only Sensitive
partition_count 1 false false
replication_factor 1 false false
Partitions 1 - 1 out of 1
Partition Leader Replicas In-Sync Replicas High Watermark
0 0 [0] [0] 1

Here we get high-level information about the given topic. In addition to telling us the number of partitions and replicas - which we already knew - there’s useful data like which node is the leader for each partition and which other brokers hold its replicas. The High Watermark column shows the latest offset that has been replicated to all replicas for that partition.

You can also use rpk for producing and consuming from topics. Let’s try that.

$ echo '{"name": "Red", "website": "vectorized.io"}' | rpk topic produce cute-pandas --key record-key -H header-key:header-value
Reading message... Press CTRL + D to send, CTRL + C to cancel.
Sent record to partition 0 at offset 1 with timestamp 2020-10-07 18:29:08.481278811 +0000 UTC m=+0.248046468.

rpk topic produce reads from standard input, so it works well for scripting or playing around with it on the command line.

Let’s see where that message went by consuming from our topic:

$ rpk topic consume cute-pandas
{
"headers": [
{
"key": "header-key",
"value": "header-value"
}
],
"key": "record-key",
"message": "{\"name\": \"Red\", \"website\": \"vectorized.io\"}\n",
"partition": 0,
"offset": 1,
"timestamp": "2020-10-07T18:29:08.481Z"
}

Which is just what we had produced! 🎉

rpk will consume from the “beginning”, and block waiting for new records. You can exit by pressing CTRL+C.

If you’re planning on piping the incoming records to other commands (e.g. jq, awk), you can set pretty-printing off by passing --pretty-print false.

When inspecting live traffic, you may also want to set --offset newest to avoid consuming from the topic’s first record. --partitions is also a useful flag to further filter the consumed records.

cute-pandas is a toy topic, with only one partition and a replication factor of 1. For “bigger” topics with thousands of partitions, replicated across multiple nodes, it’s useful to have a way to check their health. That’s what rpk topic info is for!

$ rpk topic info cute-pandas
Name cute-pandas
Internal false
Partitions 1
Under-replicated partitions None
Unavailable partitions None

If there’s any issues with our topic’s partitions, we would see it here:

  • Under-replicated partitions: Partitions for which replicas have lagged below their replication factor.
  • Unavailable partitions: Partitions for which the majority of their replicas are unavailable.

Lastly, we can also delete topics:

$ rpk topic delete cute-pandas
Deleted topic 'cute-pandas'.

We can list all topics to verify that the topic was, in fact, deleted:

$ rpk topic list
No topics found.

Outro

I hope this has been useful! If you have used rpk and have any ideas on how to improve it or its UX, please reach out! We’re always excited to hear feedback and suggestions, and strive to incorporate them quickly into Redpanda and rpk.

As mentioned before, you can optionally pass --brokers to all of the rpk topic subcommands, which allows you to interact with a remote cluster brokers’ API. Make sure to also check my previous blog post on configuring TLS in Redpanda and rpk, which could be useful when interacting remotely with a cluster.

Acknowledgments

Thanks to the sarama project contributors. It was a great reference point for rpk topic.


2021-01-25: Edited all the commands to reflect the latest namespacing changes.

Related articles