Documentation
¶
Overview ¶
Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.
Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.
Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file. The intern package allows users to intern string values.
Importantly the interners built using this package can take types which are _not_ strings but which can be used to generate strings. This has the advantage that when the string representation of an object has already been interned we can skip generating the string and just return the interned string.
An example where this would be advantageous would be in a system which converts a lot of integers to strings. If a lot of those integer values are common values then this package would avoid a lot of those string allocations.
The basic interface of an interner is a single method which looks like
someTypeInterner.Get(someTypeValue) string
The string value returned may be either a newly allocated string, or a previously allocated interned string from the cache. Interned strings are stored in an *offheap.Store. This means that there is no garbage collection cost associated with keeping large numbers of interned strings.
This package contains a number of pre-made interners for the types int64, float64, time.Time, []byte and string. But this package also includes the tools to build custom interners for other types.
Because the interned strings are manually managed, and we don't have a mechanism for knowing when to free interned string values, interned strings are retained for the life of the StringInterner instance. This means that we accumulate interned strings as the StringInterner is used. To prevent uncontrolled memory exhaustion we configure an upper limit on the total number of bytes which can be used to intern strings. When this limit is reached no new strings will be interned.
It is expected that strings which are a good target for interning should appear for interning frequently and there should be a finite number of these common string values. In the case where this pattern holds true a well configured StringIntern cache will intern these popular strings before the byte limit is reached. If strings to be interned evolve over time and don't have a stable set of common string values, then this interning approach will be less effective.
Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.
Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.
Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.
Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.
Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.
Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.
Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.
Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.
Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.
Index ¶
- type BytesConverter
- type Config
- type ConverterWithBytesId
- type ConverterWithUint64Id
- type Float64Converter
- type Int64Converter
- type Interner
- func NewBytesInterner(config Config) Interner[[]byte]
- func NewFloat64Interner(config Config, fmt byte, prec, bitSize int) Interner[float64]
- func NewInt64Interner(config Config, base int) Interner[int64]
- func NewStringInterner(config Config) Interner[string]
- func NewTimeInterner(config Config, format string) Interner[time.Time]
- type InternerWithBytesId
- type InternerWithUint64Id
- type Stats
- type StatsSummary
- type StringConverter
- type TimeConverter
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type BytesConverter ¶
type BytesConverter struct {
// contains filtered or unexported fields
}
func NewBytesConverter ¶
func NewBytesConverter(bytes []byte) BytesConverter
func (BytesConverter) Identity ¶
func (c BytesConverter) Identity() []byte
type Config ¶
type Config struct {
// Defines the maximum length of a string which can be interned.
// Strings longer than this will be generated but not interned.
//
// <= 0 indicates no limit on string length.
MaxLen int
// Defines the maximum total number of bytes which can be interned.
// Once this limit is reached no new strings will be interned.
//
// <= 0 indicates no limit on total bytes, this may risk memory
// exhaustion.
MaxBytes int
// Defines the number shards used internally to determine the level of
// available concurrency for the interner.
//
// Important to note that this is not an exactly configurable
// parameter. The number of shards must always be a power of two, and
// the value provided here may be rounded up if necessary.
//
// <= 0 indicates that the interner should determine the number of
// shards automatically.
Shards int
// Defines the offheap store to use for allocating interned strings.
//
// If nil then a new store will be created internally. Only needed if
// you want to share a single offheap store across multiple interners.
Store *offheap.Store
}
type ConverterWithBytesId ¶
type ConverterWithBytesId interface {
Identity() []byte
}
An ConverterWithBytesId converts types to strings which are able to be canonically identified by a []byte value.
A good example of this is a plain []byte. But many complex types could use this converter with values which can't be canonically identified by a single uint64.
We don't include the String() method here, because we will _always_ directly convert the []byte into a string. This is important because the []byte value is compared, byte-wise, to the existing interned string value identified by the hash. If the string value could be different from the []byte from the identity then this comparison wouldn't work.
type ConverterWithUint64Id ¶
An ConverterWithUint64Id converts types to strings which are able to be canonically identified by a uint64 value.
A good example of this is an actual uint64 value. Another example would be a time.Time value which is identified by its UnixNanos() value.
type Float64Converter ¶
type Float64Converter struct {
// contains filtered or unexported fields
}
A flexible converter for float64 values. Here the identity is generated by a call to math.Float64bits(...) and we convert the value into a string using strconv.FormatFloat(...)
func NewFloat64Converter ¶
func NewFloat64Converter(value float64, fmt byte, prec, bitSize int) Float64Converter
func (Float64Converter) Identity ¶
func (c Float64Converter) Identity() uint64
func (Float64Converter) String ¶
func (c Float64Converter) String() string
type Int64Converter ¶
type Int64Converter struct {
// contains filtered or unexported fields
}
A converter for int64 values. Here the identity is just the value itself.
func NewInt64Converter ¶
func NewInt64Converter(value int64, base int) Int64Converter
func (Int64Converter) Identity ¶
func (c Int64Converter) Identity() uint64
func (Int64Converter) String ¶
func (c Int64Converter) String() string
type Interner ¶
type Interner[T any] interface { Get(t T) string GetStats() StatsSummary }
func NewBytesInterner ¶
func NewFloat64Interner ¶
func NewStringInterner ¶
type InternerWithBytesId ¶
type InternerWithBytesId[C ConverterWithBytesId] struct { // contains filtered or unexported fields }
A InternerWithBytesId is the type which manages the interning of strings.
func NewInternerWithBytesId ¶
func NewInternerWithBytesId[C ConverterWithBytesId](config Config) InternerWithBytesId[C]
Construct a new InternerWithBytesId with the provided config.
func (*InternerWithBytesId[C]) Get ¶
func (i *InternerWithBytesId[C]) Get(converter C) string
Converts converter into a string representation
The string value may be retrieved from an interning cache or stored in the cache. Regardless of whether the string is or was interned, the correct string value is returned.
func (*InternerWithBytesId[C]) GetStats ¶
func (i *InternerWithBytesId[C]) GetStats() StatsSummary
Retrieves the summarised stats for interned strings
type InternerWithUint64Id ¶
type InternerWithUint64Id[C ConverterWithUint64Id] struct { // contains filtered or unexported fields }
A InternerWithUint64Id is the type which manages the interning of strings.
func NewInternerWithUint64Id ¶
func NewInternerWithUint64Id[C ConverterWithUint64Id](config Config) InternerWithUint64Id[C]
Construct a new InternerWithUint64Id with the provided config.
func (*InternerWithUint64Id[C]) Get ¶
func (i *InternerWithUint64Id[C]) Get(converter C) string
Returns the string representation of converter.
The string value may be retrieved from an interning cache or stored in the cache. Regardless of whether the string is or was interned, the correct string value is returned.
func (*InternerWithUint64Id[C]) GetStats ¶
func (i *InternerWithUint64Id[C]) GetStats() StatsSummary
Retrieves the summarised stats for interned int strings
type Stats ¶
type Stats struct {
Returned int
Interned int
MaxLenExceeded int
UsedBytesExceeded int
HashCollision int
}
The statistics capturing the runtime behaviour of the interner.
Returned indicates the number of previously interned strings that have been returned.
Interned indicates the number of strings which have been interned.
MaxLenExceeded indicates the number of strings not interned because they were too long.
UsedBytesExceeded indicates the number of strings not interned because the global usedBytes limit was exceeded.
HashCollision indicates the number of strings not interned because of a hash collision.
type StatsSummary ¶
A summary of the stats for a specific type of interned converter.
UsedBytes stat is global across all converters.
Total is sum across all shards of the fields in Stats.
Shards holds the individual shard Stats.
type StringConverter ¶
type StringConverter struct {
// contains filtered or unexported fields
}
func NewStringConverter ¶
func NewStringConverter(str string) StringConverter
func (StringConverter) Identity ¶
func (c StringConverter) Identity() []byte
type TimeConverter ¶
type TimeConverter struct {
// contains filtered or unexported fields
}
Converter for time.Time. The int64 UnixNano() value is used to uniquely identify each time.Time. If time.Time values are used with different time zones but which have the same nanosecond values, this converter will consider them to be the same and may produce unexpected output.
Having a converter/interner per timezone is currently the best way to handle this.
func NewTimeConverter ¶
func NewTimeConverter(value time.Time, format string) TimeConverter
func (TimeConverter) Identity ¶
func (c TimeConverter) Identity() uint64
func (TimeConverter) String ¶
func (c TimeConverter) String() string