Author 
Message 
Eighty science forum beginner
Joined: 14 Jul 2006
Posts: 4

Posted: Fri Jul 14, 2006 12:01 pm Post subject:
Probability to find a substring in a string



How many strings of length n are there that contain a substring of
length d (at least once), if every character can assume r different
values?
(or the equivalent problem: what's the probability to find a substring
of length d in a string of length n?)
There doesn't seem to be a solution (that is analytic and elementary),
but the problem is so simple, I'm wondering if I've missed something.
My solution (so far):
Let f(n) be the number of strings of length n that contain a string of
length d at least once. I've found that
f(n)=r^n(1+h(n)), where
h(n+d)h(n+d1)+h(n)=0, and h(n) = 1 for n<d.
For d=1, this reduces to f(n)=r^n, f(0)=0, as expected.
For d=2, h = (((1)^(n/3) + ((1)^(2/3))^n*(1 + (1)^(2/3)))/(1 +
(1)^(1/3))), but is simply the periodic sequence {1, 1, 0, 1, 1, 0,
....}
For larger values, especially d>4, it gets harder, since the recurrence
for h has the characteristic equation x^dx^(d1)+1=0, which probably
isn't solvable in general (is it?).
Is it possible to get any farther? What's been written on this problem?
I tried googling for "probability" and "substring", but that didn't
yield anything. 

Back to top 


Robert B. Israel science forum Guru
Joined: 24 Mar 2005
Posts: 2151

Posted: Fri Jul 14, 2006 7:55 pm Post subject:
Re: Probability to find a substring in a string



In article <1152878479.687483.57580@b28g2000cwb.googlegroups.com>,
Eighty <eightyx@gmail.com> wrote:
Quote:  How many strings of length n are there that contain a substring of
length d (at least once), if every character can assume r different
values?
(or the equivalent problem: what's the probability to find a substring
of length d in a string of length n?)

It depends on the substring. For example, for r = 2, n = 4 and d = 3
there are four strings containing 101, but only three containing 000.
Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada 

Back to top 


Eighty science forum beginner
Joined: 14 Jul 2006
Posts: 4

Posted: Fri Jul 14, 2006 10:22 pm Post subject:
Re: Probability to find a substring in a string



Quote:  It depends on the substring. For example, for r = 2, n = 4 and d = 3
there are four strings containing 101, but only three containing 000.
Right. Let's assume the substring doesn't end like it starts (like 
"bob" and "coco" do).
My above calculation was incorrect, correct one here:
Assuming the above, we can write
f(n)=(r^0f(0))r^(nd) + ... + (r^(nd)f(nd))r^0
which gives us the recurrence
f(n)rf(n1)+f(nd)=r^(nd) where f(n) = 0 for n<d
and substituting f(n)=r^n (1+h(n)), we get the simpler
h(n)h(n1)+h(nd)/r^d=0 where h(n) = 1 for n<d.
However, this recurrence's characteristic polynomial is in general not
solvable, so I guess neither is the problem (for d>4).
But if you know if something's been written on this, I'd like to read
it. 

Back to top 


Robert B. Israel science forum Guru
Joined: 24 Mar 2005
Posts: 2151

Posted: Fri Jul 14, 2006 11:28 pm Post subject:
Re: Probability to find a substring in a string



Eighty wrote:
Quote:  It depends on the substring. For example, for r = 2, n = 4 and d = 3
there are four strings containing 101, but only three containing 000.
Right. Let's assume the substring doesn't end like it starts (like
"bob" and "coco" do).

Suppose f(n) is the number you want, where there are n letters, the
string has length n and the substring has length d, and "the substring
doesn't end like it starts", i.e. there is no j in {1,...,d1) such
that the substring's initial j letters form its final j letters.
Then f(n)  r f(n1) + f(nd) = r^(nd), with f(n) = 0 for n < d.
And the solution is f(n) = r^n  sum_z z^(n+1)/(r  d z^(1d))
where the sum is over the roots z of the polynomial X^d  r X^(d1) + 1
(I assume there are no multiple roots).
Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada 

Back to top 


Eighty science forum beginner
Joined: 14 Jul 2006
Posts: 4

Posted: Sat Jul 15, 2006 12:16 am Post subject:
Re: Probability to find a substring in a string



Quote:  And the solution is f(n) = r^n  sum_z z^(n+1)/(r  d z^(1d))
where the sum is over the roots z of the polynomial X^d  r X^(d1) + 1
(I assume there are no multiple roots).
Thanks! How did you arrive at that? That is, how did you get the 
explicit solution for the homogeneous equation f(n)  r f(n1) + f(nd)
= 0 with f(n)=0, n < d, without all the annoying gaussian elimination?
It's probably a standard method, so the name of it, or a link to a
paper would be appreciated. 

Back to top 


Robert B. Israel science forum Guru
Joined: 24 Mar 2005
Posts: 2151

Posted: Sun Jul 16, 2006 9:12 am Post subject:
Re: Probability to find a substring in a string



Eighty wrote:
Quote:  And the solution is f(n) = r^n  sum_z z^(n+1)/(r  d z^(1d))
where the sum is over the roots z of the polynomial X^d  r X^(d1) + 1
(I assume there are no multiple roots).
Thanks! How did you arrive at that? That is, how did you get the
explicit solution for the homogeneous equation f(n)  r f(n1) + f(nd)
= 0 with f(n)=0, n < d, without all the annoying gaussian elimination?
It's probably a standard method, so the name of it, or a link to a
paper would be appreciated.

Actually I cheated and used Maple, but it should be fairly
straightforward
to check that this works.
BTW, you may be interested to know that when r is large, the
polynomial X^d  r X^(d1) + 1 has one root very close to r  r^(1d)
while
the others are close to the (d1)'th roots of 1/r.
Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada 

Back to top 


Robert B. Israel science forum Guru
Joined: 24 Mar 2005
Posts: 2151

Posted: Sun Jul 16, 2006 9:12 am Post subject:
Re: Probability to find a substring in a string



Eighty wrote:
Quote:  And the solution is f(n) = r^n  sum_z z^(n+1)/(r  d z^(1d))
where the sum is over the roots z of the polynomial X^d  r X^(d1) + 1
(I assume there are no multiple roots).
Thanks! How did you arrive at that? That is, how did you get the
explicit solution for the homogeneous equation f(n)  r f(n1) + f(nd)
= 0 with f(n)=0, n < d, without all the annoying gaussian elimination?
It's probably a standard method, so the name of it, or a link to a
paper would be appreciated.

Actually I cheated and used Maple, but it should be fairly
straightforward
to check that this works.
BTW, you may be interested to know that when r is large, the
polynomial X^d  r X^(d1) + 1 has one root very close to r  r^(1d)
while
the others are close to the (d1)'th roots of 1/r.
Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada 

Back to top 


Robert B. Israel science forum Guru
Joined: 24 Mar 2005
Posts: 2151

Posted: Sun Jul 16, 2006 6:17 pm Post subject:
Re: Probability to find a substring in a string



In article <1153041139.092376.109100@b28g2000cwb.googlegroups.com>,
Robert Israel <israel@math.ubc.ca> wrote:
Quote: 
Eighty wrote:
And the solution is f(n) = r^n  sum_z z^(n+1)/(r  d z^(1d))
where the sum is over the roots z of the polynomial X^d  r X^(d1) + 1
(I assume there are no multiple roots).
Thanks! How did you arrive at that? That is, how did you get the
explicit solution for the homogeneous equation f(n)  r f(n1) + f(nd)
= 0 with f(n)=0, n < d, without all the annoying gaussian elimination?
It's probably a standard method, so the name of it, or a link to a
paper would be appreciated. :)
Actually I cheated and used Maple, but it should be fairly
straightforward
to check that this works.

I think "Z transform" is the name to look up.
Quote:  BTW, you may be interested to know that when r is large, the
polynomial X^d  r X^(d1) + 1 has one root very close to r  r^(1d)
while
the others are close to the (d1)'th roots of 1/r.

Since the contribution of the other roots to f(n) is small,
a good approximation should be obtained using only the root close
to r. If we write X = rY, we have
(rY)^d  r (rY)^(d1) + 1 = r^d (Y^d  Y^(d1) + r^(d))
so our approximation to f(n) is
r^n  r^(n+1) y^(n+1)/(r  d r^(1d) y^(1d))
= r^n (1  y^(n+1)/(1  d r^(d) y^(1d)))
where y = g(r^(d)), if g(z) is the root near 1 of
Y^d  Y^(d1) + z.
The Lagrange Inversion Formula says g(z) has a series expansion
g(z) = 1  sum_{m=1}^infty (product_{j=2}^m (mdj)) z^m/m!
which should converge for z < 1/(d e) according to the Ratio
Test.
Thus for d = 7 and r = 26, taking the sum up to m=5 yields
g(26^(7)) ~ 4178084238098676561481061015118275629148483030241
/ 4178084238618868666397164711796811655402510876672
and the approximation
f(n) ~ 26^n (1  y^(n+1)/(17/(26^7 y^6)))
using this value for y appears to have a relative error of less than
2*10^(9) for all n >= 7, and less than 10^(40) for n >= 28.
Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada 

Back to top 


Google


Back to top 



The time now is Sun Apr 21, 2019 12:09 am  All times are GMT

