FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups 
 ProfileProfile   PreferencesPreferences   Log in to check your private messagesLog in to check your private messages   Log inLog in 
Forum index » Science and Technology » Math
Probability to find a substring in a string
Post new topic   Reply to topic Page 1 of 1 [8 Posts] View previous topic :: View next topic
Author Message
Eighty
science forum beginner


Joined: 14 Jul 2006
Posts: 4

PostPosted: Fri Jul 14, 2006 12:01 pm    Post subject: Probability to find a substring in a string Reply with quote

How many strings of length n are there that contain a substring of
length d (at least once), if every character can assume r different
values?

(or the equivalent problem: what's the probability to find a substring
of length d in a string of length n?)

There doesn't seem to be a solution (that is analytic and elementary),
but the problem is so simple, I'm wondering if I've missed something.

My solution (so far):

Let f(n) be the number of strings of length n that contain a string of
length d at least once. I've found that
f(n)=r^n(1+h(n)), where
h(n+d)-h(n+d-1)+h(n)=0, and h(n) = -1 for n<d.

For d=1, this reduces to f(n)=r^n, f(0)=0, as expected.
For d=2, h = -(((-1)^(n/3) + (-(-1)^(2/3))^n*(1 + (-1)^(2/3)))/(1 +
(-1)^(1/3))), but is simply the periodic sequence {-1, -1, 0, 1, 1, 0,
....}
For larger values, especially d>4, it gets harder, since the recurrence
for h has the characteristic equation x^d-x^(d-1)+1=0, which probably
isn't solvable in general (is it?).

Is it possible to get any farther? What's been written on this problem?
I tried googling for "probability" and "substring", but that didn't
yield anything.
Back to top
Robert B. Israel
science forum Guru


Joined: 24 Mar 2005
Posts: 2151

PostPosted: Fri Jul 14, 2006 7:55 pm    Post subject: Re: Probability to find a substring in a string Reply with quote

In article <1152878479.687483.57580@b28g2000cwb.googlegroups.com>,
Eighty <eightyx@gmail.com> wrote:
Quote:
How many strings of length n are there that contain a substring of
length d (at least once), if every character can assume r different
values?

(or the equivalent problem: what's the probability to find a substring
of length d in a string of length n?)

It depends on the substring. For example, for r = 2, n = 4 and d = 3
there are four strings containing 101, but only three containing 000.

Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada
Back to top
Eighty
science forum beginner


Joined: 14 Jul 2006
Posts: 4

PostPosted: Fri Jul 14, 2006 10:22 pm    Post subject: Re: Probability to find a substring in a string Reply with quote

Quote:
It depends on the substring. For example, for r = 2, n = 4 and d = 3
there are four strings containing 101, but only three containing 000.
Right. Let's assume the substring doesn't end like it starts (like

"bob" and "coco" do).

My above calculation was incorrect, correct one here:

Assuming the above, we can write
f(n)=(r^0-f(0))r^(n-d) + ... + (r^(n-d)-f(n-d))r^0
which gives us the recurrence
f(n)-rf(n-1)+f(n-d)=r^(n-d) where f(n) = 0 for n<d
and substituting f(n)=r^n (1+h(n)), we get the simpler
h(n)-h(n-1)+h(n-d)/r^d=0 where h(n) = -1 for n<d.

However, this recurrence's characteristic polynomial is in general not
solvable, so I guess neither is the problem (for d>4).

But if you know if something's been written on this, I'd like to read
it.
Back to top
Robert B. Israel
science forum Guru


Joined: 24 Mar 2005
Posts: 2151

PostPosted: Fri Jul 14, 2006 11:28 pm    Post subject: Re: Probability to find a substring in a string Reply with quote

Eighty wrote:
Quote:
It depends on the substring. For example, for r = 2, n = 4 and d = 3
there are four strings containing 101, but only three containing 000.
Right. Let's assume the substring doesn't end like it starts (like
"bob" and "coco" do).

Suppose f(n) is the number you want, where there are n letters, the
string has length n and the substring has length d, and "the substring
doesn't end like it starts", i.e. there is no j in {1,...,d-1) such
that the substring's initial j letters form its final j letters.
Then f(n) - r f(n-1) + f(n-d) = r^(n-d), with f(n) = 0 for n < d.
And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d))
where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1
(I assume there are no multiple roots).

Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada
Back to top
Eighty
science forum beginner


Joined: 14 Jul 2006
Posts: 4

PostPosted: Sat Jul 15, 2006 12:16 am    Post subject: Re: Probability to find a substring in a string Reply with quote

Quote:
And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d))
where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1
(I assume there are no multiple roots).
Thanks! How did you arrive at that? That is, how did you get the

explicit solution for the homogeneous equation f(n) - r f(n-1) + f(n-d)
= 0 with f(n)=0, n < d, without all the annoying gaussian elimination?
It's probably a standard method, so the name of it, or a link to a
paper would be appreciated. Smile
Back to top
Robert B. Israel
science forum Guru


Joined: 24 Mar 2005
Posts: 2151

PostPosted: Sun Jul 16, 2006 9:12 am    Post subject: Re: Probability to find a substring in a string Reply with quote

Eighty wrote:
Quote:
And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d))
where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1
(I assume there are no multiple roots).
Thanks! How did you arrive at that? That is, how did you get the
explicit solution for the homogeneous equation f(n) - r f(n-1) + f(n-d)
= 0 with f(n)=0, n < d, without all the annoying gaussian elimination?
It's probably a standard method, so the name of it, or a link to a
paper would be appreciated. Smile

Actually I cheated and used Maple, but it should be fairly
straightforward
to check that this works.

BTW, you may be interested to know that when r is large, the
polynomial X^d - r X^(d-1) + 1 has one root very close to r - r^(1-d)
while
the others are close to the (d-1)'th roots of 1/r.

Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada
Back to top
Robert B. Israel
science forum Guru


Joined: 24 Mar 2005
Posts: 2151

PostPosted: Sun Jul 16, 2006 9:12 am    Post subject: Re: Probability to find a substring in a string Reply with quote

Eighty wrote:
Quote:
And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d))
where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1
(I assume there are no multiple roots).
Thanks! How did you arrive at that? That is, how did you get the
explicit solution for the homogeneous equation f(n) - r f(n-1) + f(n-d)
= 0 with f(n)=0, n < d, without all the annoying gaussian elimination?
It's probably a standard method, so the name of it, or a link to a
paper would be appreciated. Smile

Actually I cheated and used Maple, but it should be fairly
straightforward
to check that this works.

BTW, you may be interested to know that when r is large, the
polynomial X^d - r X^(d-1) + 1 has one root very close to r - r^(1-d)
while
the others are close to the (d-1)'th roots of 1/r.

Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada
Back to top
Robert B. Israel
science forum Guru


Joined: 24 Mar 2005
Posts: 2151

PostPosted: Sun Jul 16, 2006 6:17 pm    Post subject: Re: Probability to find a substring in a string Reply with quote

In article <1153041139.092376.109100@b28g2000cwb.googlegroups.com>,
Robert Israel <israel@math.ubc.ca> wrote:
Quote:

Eighty wrote:
And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d))
where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1
(I assume there are no multiple roots).
Thanks! How did you arrive at that? That is, how did you get the
explicit solution for the homogeneous equation f(n) - r f(n-1) + f(n-d)
= 0 with f(n)=0, n < d, without all the annoying gaussian elimination?
It's probably a standard method, so the name of it, or a link to a
paper would be appreciated. :)

Actually I cheated and used Maple, but it should be fairly
straightforward
to check that this works.

I think "Z transform" is the name to look up.

Quote:
BTW, you may be interested to know that when r is large, the
polynomial X^d - r X^(d-1) + 1 has one root very close to r - r^(1-d)
while
the others are close to the (d-1)'th roots of 1/r.

Since the contribution of the other roots to f(n) is small,
a good approximation should be obtained using only the root close
to r. If we write X = rY, we have
(rY)^d - r (rY)^(d-1) + 1 = r^d (Y^d - Y^(d-1) + r^(-d))
so our approximation to f(n) is
r^n - r^(n+1) y^(n+1)/(r - d r^(1-d) y^(1-d))
= r^n (1 - y^(n+1)/(1 - d r^(-d) y^(1-d)))
where y = g(r^(-d)), if g(z) is the root near 1 of
Y^d - Y^(d-1) + z.

The Lagrange Inversion Formula says g(z) has a series expansion

g(z) = 1 - sum_{m=1}^infty (product_{j=2}^m (md-j)) z^m/m!

which should converge for |z| < 1/(d e) according to the Ratio
Test.

Thus for d = 7 and r = 26, taking the sum up to m=5 yields
g(26^(-7)) ~ 4178084238098676561481061015118275629148483030241
/ 4178084238618868666397164711796811655402510876672

and the approximation
f(n) ~ 26^n (1 - y^(n+1)/(1-7/(26^7 y^6)))
using this value for y appears to have a relative error of less than
2*10^(-9) for all n >= 7, and less than 10^(-40) for n >= 28.

Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada
Back to top
Google

Back to top
Display posts from previous:   
Post new topic   Reply to topic Page 1 of 1 [8 Posts] View previous topic :: View next topic
The time now is Sun Apr 21, 2019 12:09 am | All times are GMT
Forum index » Science and Technology » Math
Jump to:  

Similar Topics
Topic Author Forum Replies Last Post
No new posts Probability Question dumont Probability 0 Mon Oct 23, 2006 3:38 pm
No new posts probability gorbag Probability 0 Mon Aug 14, 2006 11:06 pm
No new posts Possible to Find the Clusters One by One?? mathlover num-analysis 0 Thu Jul 20, 2006 7:10 pm
No new posts Possible to Find the Clusters One by One?? mathlover Math 0 Thu Jul 20, 2006 7:04 pm
No new posts Probability of attaining a minimum value when rolling dice nick@blackmarble.co.uk Math 16 Tue Jul 18, 2006 2:57 pm

Copyright © 2004-2005 DeniX Solutions SRL
Other DeniX Solutions sites: Electronics forum |  Medicine forum |  Unix/Linux blog |  Unix/Linux documentation |  Unix/Linux forums  |  send newsletters
 


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.0482s ][ Queries: 16 (0.0117s) ][ GZIP on - Debug on ]