|
|
|
|
Two Way algorithm
The pattern x
is factorized in two parts x The
preprocessing phase of the algorithm consists then in choosing a good factorization
x Let (u, v) be a factorization of x. A repetition in (u, v) is a word w such that the two following properties hold:
In other words w occurs at both sides of the cut between u and v with a possible overflow on either side. The length of a repetition in (u,v) is called a local period and the length of the smallest repetition in (u,v) is called the local period and is denoted by r(u,v). Each
factorization (u,v) of has at least one repetition. It can
be easily seen that 1 A factorization (u,v) of x such that r(u,v)=per(x) is called a critical factorization of x. If (u,v)
is a critical factorization of x then at the position |u| in x
the global and the local periods are the same. The Two Way algorithm chooses
the critical factorization (x To compute the
critical factorization x The preprocessing phase can be done in O(m) time and constant space complexity. The searching
phase of the Two Way algorithm consists in first comparing the character of xr
from left to right, then the character of x When a mismatch occurs when scanning the k-th character of xr, then a shift of length k is performed. When a
mismatch occurs when scanning x Such a scheme leads to a quadratic worst case algorithm, this can be avoided by a prefix memorization: when a shift of length per(x) is performed the length of the matching prefix of the pattern at the beginning of the window (namely m-per(x)) after the shift is memorized to avoid to scan it again during the next attempt. The searching phase of the Two Way algorithm can be done in O(n) time complexity. The Two Way algorithm performs 2n-m text character comparisons in the worst case. Breslauer designed a variation on the Two Way algorithm which performs less than 2n-m comparisons using constant space. /* Computing of the maximal suffix for <= */
int maxSuf(char *x, int m, int *p) {
int ms, j, k;
char a, b;
ms = -1;
j = 0;
k = *p = 1;
while (j + k < m) {
a = x[j + k];
b = x[ms + k];
if (a < b) {
j += k;
k = 1;
*p = j - ms;
}
else
if (a == b)
if (k != *p)
++k;
else {
j += *p;
k = 1;
}
else { /* a > b */
ms = j;
j = ms + 1;
k = *p = 1;
}
}
return(ms);
}
/* Computing of the maximal suffix for >= */
int maxSufTilde(char *x, int m, int *p) {
int ms, j, k;
char a, b;
ms = -1;
j = 0;
k = *p = 1;
while (j + k < m) {
a = x[j + k];
b = x[ms + k];
if (a > b) {
j += k;
k = 1;
*p = j - ms;
}
else
if (a == b)
if (k != *p)
++k;
else {
j += *p;
k = 1;
}
else { /* a < b */
ms = j;
j = ms + 1;
k = *p = 1;
}
}
return(ms);
}
/* Two Way string matching algorithm. */
void TW(char *x, int m, char *y, int n) {
int i, j, ell, memory, p, per, q;
/* Preprocessing */
i = maxSuf(x, m, &p);
j = maxSufTilde(x, m, &q);
if (i > j) {
ell = i;
per = p;
}
else {
ell = j;
per = q;
}
/* Searching */
if (memcmp(x, x + per, ell + 1) == 0) {
j = 0;
memory = -1;
while (j <= n - m) {
i = MAX(ell, memory) + 1;
while (i < m && x[i] == y[i + j])
++i;
if (i >= m) {
i = ell;
while (i > memory && x[i] == y[i + j])
--i;
if (i <= memory)
OUTPUT(j);
j += per;
memory = m - per - 1;
}
else {
j += (i - ell);
memory = -1;
}
}
}
else {
per = MAX(ell + 1, m - ell - 1) + 1;
j = 0;
while (j <= n - m) {
i = ell + 1;
while (i < m && x[i] == y[i + j])
++i;
if (i >= m) {
i = ell;
while (i >= 0 && x[i] == y[i + j])
--i;
if (i < 0)
OUTPUT(j);
j += per;
}
else
j += (i - ell);
}
}
}
Preprocessing phase
|
user comments and suggestions are invited at KmailDrive@gmail.com