intern

package
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 9, 2024 License: MIT Imports: 10 Imported by: 0

Documentation

Overview

Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.

Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.

Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file. The intern package allows users to intern string values.

Importantly the interners built using this package can take types which are _not_ strings but which can be used to generate strings. This has the advantage that when the string representation of an object has already been interned we can skip generating the string and just return the interned string.

An example where this would be advantageous would be in a system which converts a lot of integers to strings. If a lot of those integer values are common values then this package would avoid a lot of those string allocations.

The basic interface of an interner is a single method which looks like

someTypeInterner.Get(someTypeValue) string

The string value returned may be either a newly allocated string, or a previously allocated interned string from the cache. Interned strings are stored in an *offheap.Store. This means that there is no garbage collection cost associated with keeping large numbers of interned strings.

This package contains a number of pre-made interners for the types int64, float64, time.Time, []byte and string. But this package also includes the tools to build custom interners for other types.

Because the interned strings are manually managed, and we don't have a mechanism for knowing when to free interned string values, interned strings are retained for the life of the StringInterner instance. This means that we accumulate interned strings as the StringInterner is used. To prevent uncontrolled memory exhaustion we configure an upper limit on the total number of bytes which can be used to intern strings. When this limit is reached no new strings will be interned.

It is expected that strings which are a good target for interning should appear for interning frequently and there should be a finite number of these common string values. In the case where this pattern holds true a well configured StringIntern cache will intern these popular strings before the byte limit is reached. If strings to be interned evolve over time and don't have a stable set of common string values, then this interning approach will be less effective.

Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.

Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.

Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.

Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.

Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.

Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.

Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.

Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.

Copyright 2024 Francis Michael Stephens. All rights reserved. Use of this source code is governed by an MIT license that can be found in the LICENSE file.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type BytesConverter

type BytesConverter struct {
	// contains filtered or unexported fields
}

func NewBytesConverter

func NewBytesConverter(bytes []byte) BytesConverter

func (BytesConverter) Identity

func (c BytesConverter) Identity() []byte

type Config

type Config struct {
	// Defines the maximum length of a string which can be interned.
	// Strings longer than this will be generated but not interned.
	//
	// <= 0 indicates no limit on string length.
	MaxLen int

	// Defines the maximum total number of bytes which can be interned.
	// Once this limit is reached no new strings will be interned.
	//
	// <= 0 indicates no limit on total bytes, this may risk memory
	// exhaustion.
	MaxBytes int

	// Defines the number shards used internally to determine the level of
	// available concurrency for the interner.
	//
	// Important to note that this is not an exactly configurable
	// parameter. The number of shards must always be a power of two, and
	// the value provided here may be rounded up if necessary.
	//
	// <= 0 indicates that the interner should determine the number of
	// shards automatically.
	Shards int

	// Defines the offheap store to use for allocating interned strings.
	//
	// If nil then a new store will be created internally. Only needed if
	// you want to share a single offheap store across multiple interners.
	Store *offheap.Store
}

type ConverterWithBytesId

type ConverterWithBytesId interface {
	Identity() []byte
}

An ConverterWithBytesId converts types to strings which are able to be canonically identified by a []byte value.

A good example of this is a plain []byte. But many complex types could use this converter with values which can't be canonically identified by a single uint64.

We don't include the String() method here, because we will _always_ directly convert the []byte into a string. This is important because the []byte value is compared, byte-wise, to the existing interned string value identified by the hash. If the string value could be different from the []byte from the identity then this comparison wouldn't work.

type ConverterWithUint64Id

type ConverterWithUint64Id interface {
	Identity() uint64
	String() string
}

An ConverterWithUint64Id converts types to strings which are able to be canonically identified by a uint64 value.

A good example of this is an actual uint64 value. Another example would be a time.Time value which is identified by its UnixNanos() value.

type Float64Converter

type Float64Converter struct {
	// contains filtered or unexported fields
}

A flexible converter for float64 values. Here the identity is generated by a call to math.Float64bits(...) and we convert the value into a string using strconv.FormatFloat(...)

func NewFloat64Converter

func NewFloat64Converter(value float64, fmt byte, prec, bitSize int) Float64Converter

func (Float64Converter) Identity

func (c Float64Converter) Identity() uint64

func (Float64Converter) String

func (c Float64Converter) String() string

type Int64Converter

type Int64Converter struct {
	// contains filtered or unexported fields
}

A converter for int64 values. Here the identity is just the value itself.

func NewInt64Converter

func NewInt64Converter(value int64, base int) Int64Converter

func (Int64Converter) Identity

func (c Int64Converter) Identity() uint64

func (Int64Converter) String

func (c Int64Converter) String() string

type Interner

type Interner[T any] interface {
	Get(t T) string
	GetStats() StatsSummary
}

func NewBytesInterner

func NewBytesInterner(config Config) Interner[[]byte]

func NewFloat64Interner

func NewFloat64Interner(config Config, fmt byte, prec, bitSize int) Interner[float64]

func NewInt64Interner

func NewInt64Interner(config Config, base int) Interner[int64]

func NewStringInterner

func NewStringInterner(config Config) Interner[string]

func NewTimeInterner

func NewTimeInterner(config Config, format string) Interner[time.Time]

type InternerWithBytesId

type InternerWithBytesId[C ConverterWithBytesId] struct {
	// contains filtered or unexported fields
}

A InternerWithBytesId is the type which manages the interning of strings.

func NewInternerWithBytesId

func NewInternerWithBytesId[C ConverterWithBytesId](config Config) InternerWithBytesId[C]

Construct a new InternerWithBytesId with the provided config.

func (*InternerWithBytesId[C]) Get

func (i *InternerWithBytesId[C]) Get(converter C) string

Converts converter into a string representation

The string value may be retrieved from an interning cache or stored in the cache. Regardless of whether the string is or was interned, the correct string value is returned.

func (*InternerWithBytesId[C]) GetStats

func (i *InternerWithBytesId[C]) GetStats() StatsSummary

Retrieves the summarised stats for interned strings

type InternerWithUint64Id

type InternerWithUint64Id[C ConverterWithUint64Id] struct {
	// contains filtered or unexported fields
}

A InternerWithUint64Id is the type which manages the interning of strings.

func NewInternerWithUint64Id

func NewInternerWithUint64Id[C ConverterWithUint64Id](config Config) InternerWithUint64Id[C]

Construct a new InternerWithUint64Id with the provided config.

func (*InternerWithUint64Id[C]) Get

func (i *InternerWithUint64Id[C]) Get(converter C) string

Returns the string representation of converter.

The string value may be retrieved from an interning cache or stored in the cache. Regardless of whether the string is or was interned, the correct string value is returned.

func (*InternerWithUint64Id[C]) GetStats

func (i *InternerWithUint64Id[C]) GetStats() StatsSummary

Retrieves the summarised stats for interned int strings

type Stats

type Stats struct {
	Returned          int
	Interned          int
	MaxLenExceeded    int
	UsedBytesExceeded int
	HashCollision     int
}

The statistics capturing the runtime behaviour of the interner.

Returned indicates the number of previously interned strings that have been returned.

Interned indicates the number of strings which have been interned.

MaxLenExceeded indicates the number of strings not interned because they were too long.

UsedBytesExceeded indicates the number of strings not interned because the global usedBytes limit was exceeded.

HashCollision indicates the number of strings not interned because of a hash collision.

type StatsSummary

type StatsSummary struct {
	UsedBytes int
	Total     Stats
	Shards    []Stats
}

A summary of the stats for a specific type of interned converter.

UsedBytes stat is global across all converters.

Total is sum across all shards of the fields in Stats.

Shards holds the individual shard Stats.

type StringConverter

type StringConverter struct {
	// contains filtered or unexported fields
}

func NewStringConverter

func NewStringConverter(str string) StringConverter

func (StringConverter) Identity

func (c StringConverter) Identity() []byte

type TimeConverter

type TimeConverter struct {
	// contains filtered or unexported fields
}

Converter for time.Time. The int64 UnixNano() value is used to uniquely identify each time.Time. If time.Time values are used with different time zones but which have the same nanosecond values, this converter will consider them to be the same and may produce unexpected output.

Having a converter/interner per timezone is currently the best way to handle this.

func NewTimeConverter

func NewTimeConverter(value time.Time, format string) TimeConverter

func (TimeConverter) Identity

func (c TimeConverter) Identity() uint64

func (TimeConverter) String

func (c TimeConverter) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL