Ctrl K

Set, List, Bag, Map, Dictionary, Tuple, and Sequence

Core collection terms that appear in programming, mathematics, data processing, and machine learning representations.

These collection terms look simple, but they affect how data is counted, ordered, grouped, indexed, and represented. Many machine learning ideas become easier when these basic representation terms are clear.

Overview

A collection is a way to hold multiple values. Different collection types answer different questions.

The most important differences are whether order matters, whether duplicates are allowed, whether lookup uses a key, and whether the structure is fixed or variable.

TermOrder mattersDuplicates allowedMain idea
SetNoNoA collection of unique values
ListYesYesAn ordered collection of values
BagNoYesA collection that counts repeated values
MapNot usually the main pointKeys are uniqueA key to value relationship
DictionaryNot usually the main pointKeys are uniqueA programming version of a map
TupleYesYesA fixed ordered group of values
SequenceYesUsually yesA general ordered series of values

Set

A set is a collection of unique values. If the same value is inserted more than once, it still appears once.

In mathematics and programming, a set is useful when membership matters more than position or count.

items = {"apple", "banana", "apple"}

print(items)

The repeated apple value does not become a second separate item. This is the main reason a set is not a good representation when counts matter.

List

A list is an ordered collection. Values can repeat, and the position of each value matters.

Lists are common in code because they preserve the order in which values are added.

items = ["apple", "banana", "apple"]

print(items[0])
print(items[2])

Bag

A bag is also called a multiset. It is a collection where repeated values are allowed, but order is usually not important.

A bag is useful when counts matter but position does not. This is why the term appears in bag of words.

from collections import Counter

items = ["apple", "banana", "apple"]
bag = Counter(items)

print(bag)

Map and dictionary

A map stores relationships from keys to values. A dictionary is a common programming implementation of this idea.

In Python, dictionary keys are unique. If the same key is assigned again, the value is replaced.

scores = {
    "alice": 10,
    "bob": 8,
}

scores["alice"] = 12

print(scores)

Tuple and sequence

A tuple is an ordered group of values. It is often used when values belong together as one record or coordinate.

A sequence is a broader term for ordered values. Text, tokens, arrays, lists, and time series can all be discussed as sequences depending on context.

point = (3, 5)
tokens = ["machine", "learning", "model"]

print(point)
print(tokens)

Why this matters in ML

  • A set is useful for unique vocabulary items.
  • A list is useful for ordered tokens.
  • A bag is useful for word counts.
  • A dictionary is useful for mappings such as token to id or feature to value.
  • A sequence is important for text, time series, and language models.