User Process Management CLI¶
Generally, it's common to have a cli tool to manage the user processes, with respective commands to help start
, stop
, restart
processes like systemctl
. This blog records some experience building a CLI tool in Go to manage processes. Moreover, it lists some sceneries to make it work well as a PID 1 process.
Here are some summary items in this blog:
- signal handling,
wait
system calls and process management - processes reaping as the init process
- start, restart and exit handling
Start: Functions to Trigger Shell¶
When a command is given to the CLI tool, how should we trigger it in Go code? We have several ways, from high level to very primitive level.
Those APIs are wrapped in each layer one by one. Moreover, exec
keeps the compatibility with the low-level APIs.
Hence, we can always use the exec
, as the document states:
Package exec runs external commands. It wraps os.StartProcess to make it easier to remap stdin and stdout, connect I/O with pipes, and do other adjustments.
Reaping: Init Process Need to Clean Zombie Processes¶
As a process is terminated but its parent hasn't wait
for it, the process becomes a zombie process
. Moreover, once its parent process dies, it becomes an orphan and is adopted by init
process. As the Linux man page states:
A child that terminates, but has not been waited for becomes a "zombie". The kernel maintains a minimal set of information about the zombie process (PID, termination status, resource usage information) in order to allow the parent to later perform a wait to obtain information about the child. As long as a zombie is not removed from the system via a wait, it will consume a slot in the kernel process table, and if this table fills, it will not be possible to create further processes. If a parent process terminates, then its "zombie" children (if any) are adopted by init(1), (or by the nearest "subreaper" process as defined through the use of the prctl(2) PR_SET_CHILD_SUBREAPER operation); init(1) automatically performs a wait to remove the zombies.
It's not a problem because the operating system cleans zombie processes. For example, in macOs, it's /sbin/launchd
clears the zombie processes. However, the given command is usually the init process inside a container. The implication is that the init process doesn't clear the spawned processes as no special clear logic is provided.
Reaping Processes¶
Consider a case in the CLI tool runs a command to start a long-running server. Inside the server, it will fork and exec several ephemeral processes. Once the long-running server command returns(the process ends), all processes it spawned become orphans and are adopted by the init process(in this scenario, the CLI tool process).
We have two possible ways for reaping, periodically or wait
when the child process is down. In this topic, I focus on the latter method.
The first thing to know is a child process stops or terminates, SIGCHLD
is sent to the parent process. Then, we need to know wait
, which waits for state changes in a child of the calling process.
In the case of a terminated child, performing a wait allows the system to release the resources associated with the child; if a wait is not performed, then the terminated child remains in a "zombie" state.
For reaping purposes, we only call wait
in the parent process.
Restarting Service in A Container¶
The k8s ensures the desired number of instances to run on the platform. Once a service is down, which means the container will exit as well, the platform starts a new container to reach the expected running instance number.
The restart
here means that restart a service without restarting the container. In this topic, I will introduce how it is achieved.
Because the restart
is an event triggered from outside, another process that is more concise, we need some communication mechanism to do the communication between processes. What's more, during restarting, not only the process of user's command but spawned processes should be killed.
Because the CLI tool is running as init process with PID 1, we can send signals to the destination directly. The signal we choose to send is SIGHUP
.
The term "hang up" (HUP) comes from the early days of computing when it was used to notify processes associated with a terminal that the user had "hung up" or disconnected.
Over time, this signal evolved to serve additional purposes, and one common convention emerged: u sing SIGHUP
to instruct a process to re-read its configuration files.
In the init process code, the SIGHUP
signal should be respected to trigger the handler. The handler should kill all descendants of the user's process and it requires us some work to do this. Here is a simple demo to list:
package main
import (
"bytes"
"fmt"
"github.com/mitchellh/go-ps"
"io"
"os"
)
func main() {
printProcessTree()
}
func printProcessTree() {
var w bytes.Buffer
entry := []int{1}
printLayer(New(), entry, &w, 0)
w.WriteTo(os.Stdout)
}
type Processes struct {
descents map[int][]int // ppid -> []pid
executable map[int]string // pid -> executable string
}
func New() *Processes {
pros, _ := ps.Processes()
m := make(map[int][]int) // ppid -> []pid
processM := make(map[int]string) // pid -> string
for _, p := range pros {
processM[p.Pid()] = p.Executable()
v, ok := m[p.PPid()]
if !ok {
m[p.PPid()] = []int{p.Pid()}
continue
}
m[p.PPid()] = append(v, p.Pid())
}
return &Processes{
descents: m,
executable: processM,
}
}
func printLayer(p *Processes, entry []int, w io.Writer, ident int) {
for _, e := range entry {
strLine := fmt.Sprintf("%s%d: %s\n",
printIndent(ident), e, p.executable[e])
w.Write([]byte(strLine))
list, ok := p.descents[e]
if !ok {
continue
}
printLayer(p, list, w, ident+1)
}
}
func printIndent(number int) string {
var idents string
for i := 0; i < number-1; i++ {
idents += "\t"
}
idents += "|_______"
return idents
}
The output looks like:
|_______7988: chrome_crashpad_
|_______7986: Electron
|_______64602: Code Helper (Plu
|_______16545: Code Helper (Plu
|_______64711: Code Helper (Plu
|_______64605: rust-analyzer
|_______64964: rust-analyzer-pr
|_______64962: rust-analyzer-pr
|_______64603: Code Helper (Plu
|_______64601: Code Helper
|_______64582: Code Helper (Ren
|_______16695: Code Helper
|_______17569: zsh
|_______47274: zsh
|_______8036: Code Helper
|_______7990: Code Helper
|_______7989: Code Helper (GPU
Exit¶
The CLI needs to handle the kill signal as well. The SIGKILL
sent by kill -9
cannot be handled. We can only handle SIGTERM
, SIGINT
and SIGQUIT
signals. We should kill all children processes and the process created by the user's command, which is the same procedure as restarting
.
Conclusion¶
This blog introduces some experiences about how to write a CLI tool as the init process(PID 1). It reveals reaping, starting/restarting users' commands, and exiting.