Vectorized logo
RedpandaVectorized CloudTeamBlogDocumentationContact
cover

Getting started with Redpanda

Use rpk to bootstrap the config, start nodes and manage topics.

Engineering   

Intro

In this post I’m going to introduce rpk, a single tool for managing your entire Redpanda cluster. It handles everything from low-level tuning to node configuration, and Kafka® level management tasks like topic creation.

Prerequisites

For the purpose of this guide, we’ll assume a fresh single node. If you have a node or a cluster running already, you can still follow this guide, but you might get different outputs from some of the commands.

If you run into any issues, please let us know in our community Slack workspace!

Installing Redpanda

Enter your email on the signup form in https://vectorized.io/ and run the generated command. It should be something similar to

$ curl -s https://<url to setup script> | sudo bash && sudo yum install redpanda -y

You can check that it was installed correctly by running

$ rpk version

Bootstrapping the configuration

Redpanda comes with a default configuration for a single-node “cluster” that works out of the box. However, there are times when you need to set certain fields of the node’s configuration, like the node’s IP address and its ID. Fortunately, there’s an rpk command for that!

$ sudo rpk config bootstrap --id 0

Note: The DEB and RPM packages install the config file in /etc/redpanda/redpanda.yaml by default, so that’s why these commands need to be run as root.

rpk config bootstrap will set the node ID to the given ID and try to discover your machine’s private IPv4 address to set the config accordingly.

If you need to set other fields, you can do so with rpk config set. Also make sure to check out our advanced config reference for a complete list of configuration fields.

Starting redpanda

The Redpanda packages come with custom systemd units to ensure isolation and resilience, also providing a simple way to run Redpanda. So if you are farmiliar with those standard system tools you’ll feel right at home. Let’s walk through it anyway.

$ sudo systemctl start redpanda

You can verify that it started by running

$ systemctl status redpanda

If everything went well, the output you see should be something like this:

● redpanda.service - Redpanda, the fastest queue in the West.
   Loaded: loaded (/usr/lib/systemd/system/redpanda.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-10-12 08:56:39 -05; 2min 8s ago
 Main PID: 21188 (redpanda)
   Status: "redpanda is ready! - release-0.99.7-178-g57e2946c - 57e2946c53f4d89e8446191b98aae1e969727090-dirty"
    Tasks: 32 (limit: 38107)
   Memory: 505.9M
   CGroup: /redpanda.slice/redpanda.service
           └─21188 /opt/redpanda/bin/redpanda --redpanda-cfg /etc/redpanda/redpanda.yaml --lock-memory false --io-properties-file /etc/redpanda/io-config.yaml

Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO  2020-10-12 08:56:39,414 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:565 - Recovered, log offsets: {start_offset:>
Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO  2020-10-12 08:56:39,425 [shard 0] storage - segment.cc:522 - Creating new segment /var/lib/redpanda/data/redpanda/kvstore/0_0/0-0-v1.log
Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO  2020-10-12 08:56:39,439 [shard 0] cluster - members_manager.cc:46 - starting cluster::members_manager...
Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO  2020-10-12 08:56:39,450 [shard 0] raft - [group_id:0, {redpanda/controller/0}] vote_stm.cc:238 - became the leader term:{1}
Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO  2020-10-12 08:56:39,450 [shard 0] storage - segment.cc:522 - Creating new segment /var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log
Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO  2020-10-12 08:56:39,450 [shard 0] cluster - state_machine.cc:18 - Starting state machine
Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO  2020-10-12 08:56:39,451 [shard 0] redpanda::main - application.cc:448 - Started RPC server listening at {host: 0.0.0.0, port: 33145}
Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO  2020-10-12 08:56:39,451 [shard 0] redpanda::main - application.cc:487 - Started Kafka API server listening at {host: 0.0.0.0, port: 9092}
Oct 12 08:56:39 localhost.localdomain rpk[21188]: INFO  2020-10-12 08:56:39,451 [shard 0] redpanda::main - application.cc:489 - Successfully started Redpanda!
Oct 12 08:56:39 localhost.localdomain systemd[1]: Started Redpanda, the fastest queue in the West..

Using systemd is especially useful when running Redpanda in production, since it’s a central place to enforce restart policies, resource isolation, and running periodic jobs.

rpk also provides a status command which gives you much more information about the current node, such as resource usage and the current configuration (your output might differ):

$ rpk status
  Version                                 release-0.99.13 (rev 76801fe1)                                                    
  Cloud Provider                          aws                                                                               
  Machine Type                            i3.large                                                                          
  OS                                      x86_64 5.4.0-1025-aws #25-Ubuntu SMP Fri Sep 11 09:37:24 UTC 2020 Ubuntu 20.04.1  
                                          LTS                                                                               
  CPU Model                               Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz                                         
  CPU Usage %                             23.993                                                                            
  Free Memory (MB)                        13271.105                                                                         
  Free Space  (MB)                        449520.945                                                                        
  cluster_id                              test                                                                              
  config_file                             /etc/redpanda/redpanda.yaml                                                       
  license_key                             eyJvIjoidmVjdG9yaXplZC5pbyIsInkiOjIwMjAsIm0iOjEwLCJkIjoxMiwiYyI6MzYyMjkwOTA3NH0=  
  node_uuid                               RKdMw6hLru7NxbMHkoFcUkvY42No26xKFDE92bx4so2h2DJtQ                                 
  organization                            vectorized.io                                                                     
  redpanda.admin                          172.31.20.28:9644                                                                 
  redpanda.auto_create_topics_enabled     true                                                                              
  redpanda.data_directory                 /var/lib/redpanda/data                                                            
  redpanda.kafka_api                      172.31.20.28:9092                                                                 
  redpanda.kafka_api_tls.cert_file                                                                                          
  redpanda.kafka_api_tls.enabled          false                                                                             
  redpanda.kafka_api_tls.key_file                                                                                           
  redpanda.kafka_api_tls.truststore_file                                                                                    
  redpanda.node_id                        0                                                                                 
  redpanda.rpc_server                     172.31.20.28:33145                                                                
  rpk.coredump_dir                        /var/lib/redpanda/coredump                                                        
  rpk.enable_memory_locking               false                                                                             
  rpk.enable_usage_stats                  true                                                                              
  rpk.tls.cert_file                                                                                                         
  rpk.tls.key_file                                                                                                          
  rpk.tls.truststore_file                                                                                                   
  rpk.tune_aio_events                     true                                                                              
  rpk.tune_clocksource                    true                                                                              
  rpk.tune_coredump                       false                                                                             
  rpk.tune_cpu                            true                                                                              
  rpk.tune_disk_irq                       true                                                                              
  rpk.tune_disk_nomerges                  true                                                                              
  rpk.tune_disk_scheduler                 true                                                                              
  rpk.tune_fstrim                         true                                                                              
  rpk.tune_network                        true                                                                              
  rpk.tune_swappiness                     true                                                                              
  rpk.tune_transparent_hugepages          false                                                                             

Note: If you’ve read our docs guide, you might be wondering why we didn’t run rpk tune. The answer is that rpk tune makes changes to your machine’s configuration to ensure the best performance possible, and some of them are persistent. Since they might affect your experience on desktop apps, it’s better not to tune your development machine.

Introducing rpk api

As part of Redpanda - and Vectorized -, our goal with rpk is for it to have everything you need to do your job. No external tools, no long setup or configuration steps. If there’s a common use-case, rpk should support it.

That’s why we added the rpk api command namespace, which allows you to interact with Redpanda’s Kafka®-compatible API without installing anything else, or using task-specific shell scripts.

All of the rpk api subcommands default to using the IP configured with rpk config bootstrap, or localhost if a configuration file isn’t found. If you need to override that (e.g. to interact with other remote brokers), you can pass a list of broker addresses (IP:port pairs) through the --brokers flag.

Managing topics

Creating a topic is as simple as running

$ rpk api topic create cute-pandas

Created topic 'cute-pandas'. Partitions: 1, replicas: 1, cleanup policy: 'delete'

Because we only have one node, the default number of partitions and replication factor is ok, but for a production-ready configuration, you’ll probably want to tune them to your needs with the -p & -r flags.

Let’s check our new topic’s configuration:

$ rpk api topic describe cute-pandas
  Name                cute-pandas  
  Internal            false        
  Config:             
  Name                Value        Read-only  Sensitive  
  partition_count     1            false      false      
  replication_factor  1            false      false      
  Partitions          1 - 1 out of 1  
  Partition           Leader          Replicas   In-Sync Replicas  High Watermark  
  0                   0               [0]        [0]               1

Here we get high-level information about the given topic. In addition to telling us the number of partitions and replicas - which we already knew - there’s useful data like which node is the leader for each partition and which other brokers hold its replicas. The High Watermark column shows the latest offset that has been replicated to all replicas for that partition.

You can also use rpk for producing and consuming from topics. Let’s try that.

echo '{"name": "Red", "website": "vectorized.io"}' | rpk api produce cute-pandas --key record-key -H header-key:header-value
Reading message... Press CTRL + D to send, CTRL + C to cancel.
Sent record to partition 0 at offset 1 with timestamp 2020-10-07 18:29:08.481278811 +0000 UTC m=+0.248046468.

rpk api produce reads from standard input, so it works well for scripting or playing around with it on the command line.

Let’s see where that message went by consuming from our topic:

$ rpk api consume cute-pandas
{
 "headers": [
  {
   "key": "header-key",
   "value": "header-value"
  }
 ],
 "key": "record-key",
 "message": "{\"name\": \"Red\", \"website\": \"vectorized.io\"}\n",
 "partition": 0,
 "offset": 1,
 "timestamp": "2020-10-07T18:29:08.481Z"
}

Which is just what we had produced! 🎉

rpk will consume from the “beginning”, and block waiting for new records. You can exit by pressing CTRL+C.

If you’re planning on piping the incoming records to other commands (e.g. jq, awk), you can set pretty-printing off by passing --pretty-print false.

When inspecting live traffic, you may also want to set --offset newest to avoid consuming from the topic’s first record. --partitions is also a useful flag to further filter the consumed records.

cute-pandas is a toy topic, with only one partition and a replication factor of 1. For “bigger” topics with thousands of partitions, replicated across multiple nodes, it’s useful to have a way to check their health. That’s what rpk api topic status is for!

$ rpk api topic status cute-pandas
  Name                         cute-pandas  
  Internal                     false        
  Partitions                   1            
  Under-replicated partitions  None         
  Unavailable partitions       None

If there’s any issues with our topic’s partitions, we would see it here:

  • Under-replicated partitions: Partitions for which replicas have lagged below their replication factor.
  • Unavailable partitions: Partitions for which the majority of their replicas are unavailable.

Lastly, we can also delete topics:

$ rpk api topic delete cute-pandas
Deleted topic 'cute-pandas'.

We can list all topics to verify that the topic was, in fact, deleted:

$ rpk api topic list
No topics found.

Outro

I hope this has been useful! If you have used rpk and have any ideas on how to improve it or its UX, please reach out! We’re always excited to hear feedback and suggestions, and strive to incorporate them quickly into Redpanda and rpk.

As mentioned before, you can optionally pass --brokers to all of the rpk api subcommands, which allows you to interact with a remote cluster brokers’ API. Make sure to also check my previous blog post on configuring TLS in Redpanda and rpk, which could be useful when interacting remotely with a cluster.

Acknowledgments

Thanks to the sarama project contributors. It was a great reference point for rpk api.