Get Correct Module Name via Trie-Style Structure¶
Go language supports to put modules under a VCS, such as github. Usually, one git repo contains one go module and multiple packages under this module. However, this is not always correct because Go allows you to put multiple modules under the same project, and Go helps to find these modules via its path. A typical example is repository go.opentelemetry.io/otel
, which contains huge amount of modules.
In my real senario, given a list of used modules parsing from go.mod
, and a package name retrieving from import declaration, I want to know which module does this package belong to. strings.Contain
isn't help at all because module paths have overlapped a lot.
Use Cases¶
Simple Case¶
Given a module google.golang.org/grpc
, you should return this module if a package google.golang.org/grpc/stats
or google.golang.org/grpc/stats/opentelemetry
is provided.
Complex Case¶
Besides the simple case, sometimes the modules are embedded inside the same repo. For example, given the modules inside go.opentelemetry.io/otel
repository:
// module list
go.opentelemetry.io/otel // the module of package go.opentelemetry.io/otel/baggage
go.opentelemetry.io/otel/sdk // the module of package go.opentelemetry.io/otel/sdk/trace
// the module of package go.opentelemetry.io/otel/sdk/metric and metricdata
go.opentelemetry.io/otel/sdk/metric
We need a strategy to figure out the corresponding module for a given package. For example, given the listed packages below, how do we differentiate it?
# the given package to find its module
go.opentelemetry.io/otel/baggage
go.opentelemetry.io/otel/sdk/trace
go.opentelemetry.io/otel/sdk/metric
go.opentelemetry.io/otel/sdk/metric/metricdata
Longest Prefix Problem¶
This scenario indeed a longest prefix problem, but we need to treat the whole word seperated by slash as a single unit which is unsplittable.
Using all given modules to construct a trie-style structure, then we can easily find the corresponding module of a package.
type Map struct {
dummy word
}
// word is a unit in module path separated by slash.
type word struct {
content string
next map[string]*word
end bool
}
func (pm *Map) Insert(path string) {
words := strings.Split(path, "/")
sentry := &pm.dummy
for _, w := range words {
next, ok := sentry.next[w]
if !ok {
nw := &word{
content: w,
next: make(map[string]*word),
}
sentry.next[w] = nw
sentry = nw
} else {
sentry = next
}
}
sentry.end = true
}
func (pm *Map) BelongingModule(pkgPath string) string {
var longestModule string
words := strings.Split(pkgPath, "/")
var number int
sentry := &pm.dummy
for _, w := range words {
next, ok := sentry.next[w]
if !ok {
break // we cannot find the next word anymore
}
number++
sentry = next
if sentry.end {
longestModule = strings.Join(words[:number], "/")
}
}
return longestModule
}