# IEEE Transactions on Information Theory

### Capacity and Expressiveness of Genomic Tandem Duplication

The majority of the human genome consists of repeated sequences. An important type of repeated sequences common in the human genome are tandem repeats, where identical copies appear next to each other. For example, in the sequence $AGTC\underline {TGTG}C$ , $TGTG$ is a tandem repeat, that may be generated from $AGTCTGC$ by a tandem duplication of length 2. In this paper, we investigate the possibility of generating a large number of sequences from a seed, i.e. a small initial string, by tandem duplications of bounded length. We study the capacity of such a system, a notion that quantifies the system’s generating power. Our results include exact capacity values for certain tandem duplication string systems. In addition, motivated by the role of DNA sequences in expressing proteins via RNA and the genetic code, we define the notion of the expressiveness of a tandem duplication system as the capability of expressing arbitrary substrings. We then completely characterize the expressiveness of tandem duplication systems for general alphabet sizes and duplication lengths. In particular, based on a celebrated result by Axel Thue from 1906, presenting a construction for ternary squarefree sequences, we show that for alphabets of size 4 or larger, bounded tandem duplication systems, regardless of the seed and the bound on duplication length, are not fully expressive, i.e. they cannot generate all strings even as substrings of other strings. Note that the alphabet of size 4 is of particular interest as it pertains to the genomic alphabet. Building on this result, we also show that these systems do not have full capacity. In general, our results illustrate that duplication len-
ths play a more significant role than the seed in generating a large number of sequences for these systems.

### Generalized Plateaued Functions and Admissible (Plateaued) Functions

Plateaued functions are very important crypto- graphic functions due to their various desirable cryptographic characteristics. We point out that plateaued functions are more general than bent functions (that is, functions with maximum nonlinearity). Some Boolean plateaued functions have large nonlinearity, which provides protection against fast correlation attacks when they are used as combiners or filters in stream ciphers, and contributes, when they are the component functions of the substitution boxes in block ciphers, to protection against linear cryptanalysis. P-ary plateaued functions have attracted recently some attention in the literature, and many activities on generalized $p$ -ary functions have been carried out. This paper increases our knowledge on plateaued functions in the general context of generalized $p$ -ary functions. We first introduce two new versions of plateaued functions, which we shall call generalized plateaued functions and admissible plateaued functions. The generalized plateaued functions extend the standard notion of plateaued $p$ -ary functions to those whose outputs are in the ring $\mathbb {Z}_{p^{k}}$ . Next, we study the generalized plateaued functions and use admissible plateaued functions to characterize the generalized plateaued functions by means of their components. Finally, we provide for the first time two constructions of generalized plateaued functions. In particular, we generalize a known secondary construction of binary generalized bent functions and derive constructions of binary generalized plateaued functions with different amplitudes.

### Generic Construction of Bent Functions and Bent Idempotents With Any Possible Algebraic Degrees

As a class of optimal combinatorial objects, bent functions have important applications in cryptography, sequence design, and coding theory. Bent idempotents are a subclass of bent functions and of great interest, since they can be stored in less space and allow faster computation of the Walsh-Hadamard transform. The objective of this paper is to present a generic construction of bent functions from known ones. It includes the previous constructions of bent functions by Mesnager and Xu et al. as special cases, and produces new bent functions, which cannot be produced by earlier ones. In particular, it also generates infinite families of bent idempotents over ${F}_{2^{2m}}$ of any algebraic degree between 2 and $m$ . This together with a recent construction by Su and Tang gives a positive answer to an open problem on bent idempotents proposed by Carlet. In addition, an infinite family of anti-self-dual bent functions is obtained in which the sum of any three distinct functions is again an anti-self-dual bent function in this family. This solves an open problem recently proposed by Mesnager.

### Low Correlation Sequences From Linear Combinations of Characters

Pairs of binary sequences formed using linear combinations of multiplicative characters of finite fields are exhibited that, when compared with a random sequence pairs, simultaneously achieve significantly lower mean square autocorrelation values (for each sequence in the pair) and significantly lower mean square crosscorrelation values. If we define crosscorrelation merit factor analogously to the usual merit factor for autocorrelation, and if we define demerit factor as the reciprocal of merit factor, then randomly selected binary sequence pairs are known to have an average crosscorrelation demerit factor of 1. Our constructions provide sequence pairs with a crosscorrelation demerit factor significantly less than 1, and at the same time, the autocorrelation demerit factors of the individual sequences can also be made significantly less than 1 (which also indicates better than average performance). The sequence pairs studied here provide combinations of autocorrelation and crosscorrelation performance that are not achievable using sequences formed from single characters, such as maximal linear recursive sequences (m-sequences) and Legendre sequences. In this paper, exact asymptotic formulae are proved for the autocorrelation and crosscorrelation merit factors of sequence pairs formed using linear combinations of multiplicative characters. Data is presented that shows that the asymptotic behavior is closely approximated by sequences of modest length.

### New Constructions of Asymptotically Optimal Codebooks With Multiplicative Characters

In practical applications, such as direct spread code division multiple access communications, space-time codes and compressed sensing, and codebooks with small inner-product correlation are required. It is extremely difficult to construct codebooks achieving the Levenshtein bound. In this paper, two new constructions of infinitely many codebooks with multiplicative characters of finite fields are presented. These constructions produce complex codebooks asymptotically achieving the Levenshtein bound and codebooks asymptotically achieving the Welch bound. The codebooks presented in this paper have new parameters.

### Investigations on Periodic Sequences With Maximum Nonlinear Complexity

The nonlinear complexity of a periodic sequence s is the length of the shortest feedback shift register that can generate s, and its value is upper bounded by the least period of s minus 1. In this paper, a recursive approach that generates all periodic sequences with maximum nonlinear complexity is presented, and the total number of such sequences is determined. The randomness properties of these sequences are also examined.

### Spatially Coupled Turbo-Like Codes

In this paper, we introduce the concept of spatially coupled turbo-like codes (SC-TCs) as the spatial coupling of a number of turbo-like code ensembles. In particular, we consider the spatial coupling of parallel concatenated codes, introduced by Berrou et al., and that of serially concatenated codes (SCCs), introduced by Benedetto et al. Furthermore, we propose two extensions of braided convolutional codes (BCCs), and a class of turbo-like codes which have an inherent spatially coupled structure, to higher coupling memories, and show that these yield improved belief propagation (BP) thresholds as compared with the original BCC ensemble. We derive the exact density evolution (DE) equations for SC-TCs and analyze their asymptotic behavior on the binary erasure channel. We also consider the construction of families of rate-compatible SC-TC ensembles. Our numerical results show that the threshold saturation of the BP decoding threshold to the maximum a posteriori threshold of the underlying uncoupled ensembles occurs for large enough coupling memory. The improvement of the BP threshold is especially significant for SCCs and BCCs, whose uncoupled ensembles suffer from a poor BP threshold. For a wide range of code rates, SC-TCs show close-to-capacity performance as the coupling memory increases. We further give a proof of threshold saturation for SC-TC ensembles with identical component encoders. In particular, we show that the DE of SC-TC ensembles with identical component encoders can be properly rewritten as a scalar recursion. This allows us to define potential functions and prove threshold saturation using the proof technique recently introduced by Yedla et al.

### A Sugiyama-Like Decoding Algorithm for Convolutional Codes

We propose a decoding algorithm for a class of convolutional codes called skew Reed-Solomon convolutional codes. These are convolutional codes of designed Hamming distance endowed with a cyclic structure yielding a left ideal of a non-commutative ring (a quotient of a skew polynomial ring). In this setting, right and left division algorithms exist, so our algorithm follows the guidelines of the Sugiyama’s procedure for finding the error locator and error evaluator polynomials for Goppa block codes.

### Improved Lower Bounds on the Size of Balls Over Permutations With the Infinity Metric

We study the size (or volume) of balls in the metric space of permutations, $S_{n}$ , under the infinity metric. We focus on the regime of balls with radius $r = \rho \cdot (n\!-\!1)$ , $\rho \in [{0,1}]$ , i.e., a radius that is a constant fraction of the maximum possible distance. We provide new lower bounds on the size of such balls. These new lower bounds reduce the asymptotic gap to the known upper bounds to at most 0.029 bits per symbol. Additionally, they imply an improved ball-packing bound for error-correcting codes, and an improved upper bound on the size of optimal covering codes.

### Two New Families of Two-Weight Codes

We construct two new infinite families of trace codes of dimension $2m$ , over the ring $\mathbb {F}_{p}+u\mathbb {F}_{p}$ , with $u^{2}=u$ , when $p$ is an odd prime. They have the algebraic structure of abelian codes. Their Lee weight distribution is computed by using Gauss sums. By Gray mapping, we obtain two infinite families of linear $p$ -ary codes of respective lengths $(p^{m}-1)^{2}$ and $2(p^{m}-1)^{2}$ . When $m$ is singly even, the first family gives five-weight codes. When $m$ is odd and $p\equiv 3 \pmod {4}$ , the first family yields $p$ -ary two-weight codes, which are shown to be optimal by application of the Griesmer bound. The second family consists of two-weight codes that are shown to be optimal, by the Griesmer bound, whenever $p=3$ and $m \ge 3$ , or $p\ge 5$ and $m\ge 4$ . Applications to secret sharing schemes are given.

### Some Repeated-Root Constacyclic Codes Over Galois Rings

Codes over Galois rings have been studied extensively during the last three decades. Negacyclic codes over $\mathop {\mathrm {GR}}\nolimits (2^{a},m)$ of length $2^{s}$ have been characterized: the ring ${\mathcal{ R}}_{2}(a,m,-1)= {\mathrm{ GR}}(2^{a},m)[x]/\langle x^{2^{s}}+1\rangle$ is a chain ring. Furthermore, these results have been generalized to $\lambda$ -constacyclic codes for any unit $\lambda$ of the form $4z-1$ , $z\in \mathop {\mathrm {GR}}\nolimits (2^{a}, m)$ . In this paper, we study more general cases and investigate all cases, where ${\mathcal{ R}}_{p}(a,m,\gamma) = {\mathrm{ GR}}(p^{a},m)[x]/\langle x^{p^{s}}-\gamma \rangle$ is a chain ring. In particular, the necessary and sufficient conditions for the ring ${\mathcal{ R}}_{p}(a,m,\gamma)$ to be a chain ring are obtained. In addition, by using this structure we investigate all $\gamma$ -constacyclic codes over $\mathop {\mathrm {GR}}\nolimits (p^{a},m)$ when ${\mathcal{ R}}_{p}(a,m,\gamma)$ is a chain ring. The necessary and sufficient conditions for the existence of self-orthogonal and self-dual -
$\gamma$ -constacyclic codes are also provided. Among others, for any prime $p$ , the structure of ${\mathcal{ R}}_{p}(a,m,\gamma) ={\mathrm{ GR}}(p^{a},m)[x]/\langle x^{p^{s}}-\gamma \rangle$ is used to establish the Hamming and homogeneous distances of $\gamma$ -constacyclic codes.

### Coding for Interactive Communication Correcting Insertions and Deletions

We consider the question of interactive communication, in which two remote parties perform a computation, while their communication channel is (adversarially) noisy. We extend here the discussion into a more general and stronger class of noise, namely, we allow the channel to perform insertions and deletions of symbols. These types of errors may bring the parties “out of sync,” so that there is no consensus regarding the current round of the protocol. In this more general noise model, we obtain the first interactive coding scheme that has a constant rate and tolerates noise rates of up to $1/18- \varepsilon $ . To this end, we develop a novel primitive we name edit-distance tree code. The edit-distance tree code is carefully designed to replace the Hamming distance constraints in Schulman’s tree codes (IEEE Trans. Inf. Theory, 1996), with a stronger edit-distance requirement.

### UEP Constructions of Quasi-Cyclic Low-Density Parity-Check Codes via Masking

In this paper, the algebraic constructions of quasi-cyclic low-density parity-check (QC-LDPC) codes with the unequal error protection (UEP) property are considered. A criterion for constructing such codes via the masking technique is proposed, based on which explicit conditions on the base parity-check matrices and masking matrices to achieve UEP are provided. We also give three specific constructions of UEP QC-LDPC codes. Furthermore, a sufficient condition to ensure strict UEP is presented. Simulation results demonstrate the superiority of our constructed codes over time-sharing schemes. The constructed codes also have competitive error performance against randomly constructed equal-error-protection LDPC codes and irregular UEP LDPC codes designed based on the degree distribution.

### Majority Logic Decoding Under Data-Dependent Logic Gate Failures

A majority logic decoder made of unreliable logic gates, whose failures are transient and data-dependent, is analyzed. Based on a combinatorial representation of fault configurations a closed-form expression for the average bit error rate for a one-step majority logic decoder is derived, for a regular low-density parity-check (LDPC) code ensemble and the proposed failure model. The presented analysis framework is then used to establish bounds on the one-step majority logic decoder performance under the simplified probabilistic gate-output switching model. Based on the expander property of Tanner graphs of LDPC codes, it is proven that a version of the faulty parallel bit-flipping decoder can correct a fixed fraction of channel errors in the presence of data-dependent gate failures. The results are illustrated with numerical examples of finite geometry codes.

### Explicit Constructions of Optimal-Access MDS Codes With Nearly Optimal Sub-Packetization

An $(n,k,l)$ maximum distance separable (MDS) array code of length $n$ , dimension $k=n-r$ , and sub-packetization $l$ is formed of $l\times n$ matrices over a finite field $F$ , with every column of the matrix stored on a separate node in the distributed storage system and viewed as a coordinate of the codeword. Repair of a failed node (recovery of one erased column) can be performed by accessing a set of $d\le n-1$ surviving (helper) nodes. The code is said to have the optimal access property if the amount of data accessed at each of the helper nodes meets a lower bound on this quantity. For optimal-access MDS codes with $d=n-1$ , the sub-packetization $l$ satisfies the bound $l\ge r^{(k-1)/r}$ . In our previous work (IEEE Trans. Inf. Theory, vol. 63, no. 4, 2017), for any $n$ and $r$ , we presented an explicit construction of optimal-access MDS codes with sub-packetization $l=r^{n-1}$ . In this paper, we take up the question of reducing the sub-packetization value $l$ to make it to approach the lower bound. We construct an explicit family of optimal-access codes with $l=r^{\lceil n/r\rceil }$ , which differs from the optimal value by at most a factor of $r^{2}$ . These codes can be constructed over any finite field $F$ as long as $|F|\ge r\lceil n/r\rceil $ , and afford low-complexity encoding and decoding procedures. We also define a version of the repair problem that bridges the context of regenerating codes and codes with locality constraints (LRC codes), which we call group repair with optimal access. In this variation, we assume that the set of $n=sm$ nodes is partitioned into $m$ repair groups of size $s$ , and require that the amount of accessed data for repair is the smallest possible whenever the $d=s+k-1$ helper nodes include all the other $s-1$ nodes from the same group as the failed node. For this problem, we construct a family of codes with the group optimal access property. These codes can be constructed over any field $F$ of size $|F|\ge n$ , and also afford low-complexity encoding and decoding procedures.

### Minimum Storage Regenerating Codes for All Parameters

Regenerating codes for distributed storage have attracted much research interest in the past decade. Such codes trade the bandwidth needed to repair a failed node with the overall amount of data stored in the network. Minimum storage regenerating (MSR) codes are an important class of optimal regenerating codes that minimize (first) the amount of data stored per node and (then) the repair bandwidth. Specifically, an $[n,k,d]$ - $(\alpha )$ MSR code $ \mathbb {C}$ over $ \smash {\mathbb {F}_{\!q}}$ stores a file $ {\mathcal{ F}}$ consisting of $\alpha k$ symbols over $ \smash {\mathbb {F}_{\!q}}$ among $n$ nodes, each storing $\alpha $ symbols, in such a way that: 1) the file $ {\mathcal{ F}}$ can be recovered by downloading the content of any $k$ of the $n$ nodes and 2) the content of any failed node can be reconstructed by accessing any $d$ of the remaining $n-1$ nodes and downloading $\alpha-
/(d{-}k{+}1)$ symbols from each of these nodes. In practice, the file $ {\mathcal{ F}}$ is typically available in uncoded form on some $k$ of the $n$ nodes, known as systematic nodes, and the defining node-repair condition above can be relaxed to requiring the optimal repair bandwidth for systematic nodes only. Such codes are called systematic–repair MSR codes. Unfortunately, finite– $\alpha $ constructions of $[n,k,d]$ MSR codes are known only for certain special cases: either low rate, namely $k/n \leqslant 0.5$ , or high repair connectivity, namely $d = n-1$ . Our main result in this paper is a finite– $\alpha $ construction of systematic-repair $[n,k,d]$ MSR codes for all possible values of parameters $n,k,d$ . We also introduce a generalized construction for $[n,k]$ MSR codes to achieve the optimal repair bandwidth for all values of $d$ simultaneously.

### Distributed Simulation of Continuous Random Variables

We establish the first known upper bound on the exact and Wyner’s common information of $n$ continuous random variables in terms of the dual total correlation between them (which is a generalization of mutual information). In particular, we show that when the pdf of the random variables is log-concave, there is a constant gap of $n^{2}\log e+9n\log n$ between this upper bound and the dual total correlation lower bound that does not depend on the distribution. The upper bound is obtained using a computationally efficient dyadic decomposition scheme for constructing a discrete common randomness variable $W$ from which the $n$ random variables can be simulated in a distributed manner. We then bound the entropy of $W$ using a new measure, which we refer to as the erosion entropy.

### Hamilton Paths With Lasting Separation

We determine the asymptotics of the largest cardinality of a set of Hamilton paths in the complete graph with vertex set $[n]$ under the condition that for any two of the paths in the family there is a subpath of length $k$ entirely contained in only one of them and edge-disjoint from the other one.