Analysis Introduction

Common statistical methods for analyzing string sequences, such as those used in bioinformatics, work with a small and well-defined (nucleobases) alphabet. In contrast, randomized serial numbers (also called tokens) are unique sequences of many freely definable but unique ASCII or / and (even mixed) Unicode character strings that can follow special formats and standardizations depending on the application, for example ⚀⚃⚅⚄⚁ from the DICE alphabet or 4GHX35D6G0TNY243ZBS2 from the UPPER_EPCG30 alphabet. Therefore, the practical analysis of token randomness here focuses mainly on simple, human-made visual pattern perception on sequences of numeric scores, deltas and walks transformed from the tokens strings, rather than a comprehensive statistical analysis of the token string sequences itself.

To do this, firstly the scores of the tokens are calculated and analyzed using the latest statistical methods. The scores represent the token's fixed, zero-based numeric position in the sorted set of all available tokens that can theoretically be generated by a production line, based on the chosen alphabet and number of digits. Next, the deltas of the scores are calculated and analyzed using the same statistical methods. The deltas are the one-step differences of the numeric scores of the tokens. Then, the walks are calculated and analyzed comparatively, which represent the cumulative one-step differences in the same way they are known in probabilistics as random walks.

Finally, further statistical methods are applied to some of the obtained results in order to generate visual patterns about the properties and meta-properties of the numeric scores, deltas and walks. This allows them to be visually compared to patterns inherent in pseudo-random numbers and physically generated random numbers (randoms for short) of the same length, obtained using the same statistical methods.

As typical examples, for the DICE and the UPPER_EPCG30 alphabet, the randomness of token sequences and the uniqueness of many token sequences of the same kind were analyzed and visualized on their own pages. In addition, two possible ways to analyze the completeness of some short token sequences for any two ASCII and Unicode alphabets with zero- and non-zero parity are shown. Finally, the bitwise steadiness and the extensiveness of many token sequences of the same kind were presented.

The data analysis is powered by Visual Studio Code, Project Jupyter and the Julia Programming Language.