Search   Memberlist   Usergroups
 Page 1 of 1 [8 Posts]
Author Message
Robert B. Israel
science forum Guru

Joined: 24 Mar 2005
Posts: 2151

Posted: Sun Jul 16, 2006 6:17 pm    Post subject: Re: Probability to find a substring in a string

Robert Israel <israel@math.ubc.ca> wrote:
 Quote: Eighty wrote: And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d)) where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1 (I assume there are no multiple roots). Thanks! How did you arrive at that? That is, how did you get the explicit solution for the homogeneous equation f(n) - r f(n-1) + f(n-d) = 0 with f(n)=0, n < d, without all the annoying gaussian elimination? It's probably a standard method, so the name of it, or a link to a paper would be appreciated. :) Actually I cheated and used Maple, but it should be fairly straightforward to check that this works.

I think "Z transform" is the name to look up.

 Quote: BTW, you may be interested to know that when r is large, the polynomial X^d - r X^(d-1) + 1 has one root very close to r - r^(1-d) while the others are close to the (d-1)'th roots of 1/r.

Since the contribution of the other roots to f(n) is small,
a good approximation should be obtained using only the root close
to r. If we write X = rY, we have
(rY)^d - r (rY)^(d-1) + 1 = r^d (Y^d - Y^(d-1) + r^(-d))
so our approximation to f(n) is
r^n - r^(n+1) y^(n+1)/(r - d r^(1-d) y^(1-d))
= r^n (1 - y^(n+1)/(1 - d r^(-d) y^(1-d)))
where y = g(r^(-d)), if g(z) is the root near 1 of
Y^d - Y^(d-1) + z.

The Lagrange Inversion Formula says g(z) has a series expansion

g(z) = 1 - sum_{m=1}^infty (product_{j=2}^m (md-j)) z^m/m!

which should converge for |z| < 1/(d e) according to the Ratio
Test.

Thus for d = 7 and r = 26, taking the sum up to m=5 yields
g(26^(-7)) ~ 4178084238098676561481061015118275629148483030241
/ 4178084238618868666397164711796811655402510876672

and the approximation
f(n) ~ 26^n (1 - y^(n+1)/(1-7/(26^7 y^6)))
using this value for y appears to have a relative error of less than
2*10^(-9) for all n >= 7, and less than 10^(-40) for n >= 28.

Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada
Robert B. Israel
science forum Guru

Joined: 24 Mar 2005
Posts: 2151

Posted: Sun Jul 16, 2006 9:12 am    Post subject: Re: Probability to find a substring in a string

Eighty wrote:
 Quote: And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d)) where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1 (I assume there are no multiple roots). Thanks! How did you arrive at that? That is, how did you get the explicit solution for the homogeneous equation f(n) - r f(n-1) + f(n-d) = 0 with f(n)=0, n < d, without all the annoying gaussian elimination? It's probably a standard method, so the name of it, or a link to a paper would be appreciated.

Actually I cheated and used Maple, but it should be fairly
straightforward
to check that this works.

BTW, you may be interested to know that when r is large, the
polynomial X^d - r X^(d-1) + 1 has one root very close to r - r^(1-d)
while
the others are close to the (d-1)'th roots of 1/r.

Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada
Robert B. Israel
science forum Guru

Joined: 24 Mar 2005
Posts: 2151

Posted: Sun Jul 16, 2006 9:12 am    Post subject: Re: Probability to find a substring in a string

Eighty wrote:
 Quote: And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d)) where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1 (I assume there are no multiple roots). Thanks! How did you arrive at that? That is, how did you get the explicit solution for the homogeneous equation f(n) - r f(n-1) + f(n-d) = 0 with f(n)=0, n < d, without all the annoying gaussian elimination? It's probably a standard method, so the name of it, or a link to a paper would be appreciated.

Actually I cheated and used Maple, but it should be fairly
straightforward
to check that this works.

BTW, you may be interested to know that when r is large, the
polynomial X^d - r X^(d-1) + 1 has one root very close to r - r^(1-d)
while
the others are close to the (d-1)'th roots of 1/r.

Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada
Eighty
science forum beginner

Joined: 14 Jul 2006
Posts: 4

Posted: Sat Jul 15, 2006 12:16 am    Post subject: Re: Probability to find a substring in a string

 Quote: And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d)) where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1 (I assume there are no multiple roots). Thanks! How did you arrive at that? That is, how did you get the

explicit solution for the homogeneous equation f(n) - r f(n-1) + f(n-d)
= 0 with f(n)=0, n < d, without all the annoying gaussian elimination?
It's probably a standard method, so the name of it, or a link to a
paper would be appreciated.
Robert B. Israel
science forum Guru

Joined: 24 Mar 2005
Posts: 2151

Posted: Fri Jul 14, 2006 11:28 pm    Post subject: Re: Probability to find a substring in a string

Eighty wrote:
 Quote: It depends on the substring. For example, for r = 2, n = 4 and d = 3 there are four strings containing 101, but only three containing 000. Right. Let's assume the substring doesn't end like it starts (like "bob" and "coco" do).

Suppose f(n) is the number you want, where there are n letters, the
string has length n and the substring has length d, and "the substring
doesn't end like it starts", i.e. there is no j in {1,...,d-1) such
that the substring's initial j letters form its final j letters.
Then f(n) - r f(n-1) + f(n-d) = r^(n-d), with f(n) = 0 for n < d.
And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d))
where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1
(I assume there are no multiple roots).

Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada
Eighty
science forum beginner

Joined: 14 Jul 2006
Posts: 4

Posted: Fri Jul 14, 2006 10:22 pm    Post subject: Re: Probability to find a substring in a string

 Quote: It depends on the substring. For example, for r = 2, n = 4 and d = 3 there are four strings containing 101, but only three containing 000. Right. Let's assume the substring doesn't end like it starts (like

"bob" and "coco" do).

My above calculation was incorrect, correct one here:

Assuming the above, we can write
f(n)=(r^0-f(0))r^(n-d) + ... + (r^(n-d)-f(n-d))r^0
which gives us the recurrence
f(n)-rf(n-1)+f(n-d)=r^(n-d) where f(n) = 0 for n<d
and substituting f(n)=r^n (1+h(n)), we get the simpler
h(n)-h(n-1)+h(n-d)/r^d=0 where h(n) = -1 for n<d.

However, this recurrence's characteristic polynomial is in general not
solvable, so I guess neither is the problem (for d>4).

But if you know if something's been written on this, I'd like to read
it.
Robert B. Israel
science forum Guru

Joined: 24 Mar 2005
Posts: 2151

Posted: Fri Jul 14, 2006 7:55 pm    Post subject: Re: Probability to find a substring in a string

Eighty <eightyx@gmail.com> wrote:
 Quote: How many strings of length n are there that contain a substring of length d (at least once), if every character can assume r different values? (or the equivalent problem: what's the probability to find a substring of length d in a string of length n?)

It depends on the substring. For example, for r = 2, n = 4 and d = 3
there are four strings containing 101, but only three containing 000.

Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada
Eighty
science forum beginner

Joined: 14 Jul 2006
Posts: 4

 Posted: Fri Jul 14, 2006 12:01 pm    Post subject: Probability to find a substring in a string How many strings of length n are there that contain a substring of length d (at least once), if every character can assume r different values? (or the equivalent problem: what's the probability to find a substring of length d in a string of length n?) There doesn't seem to be a solution (that is analytic and elementary), but the problem is so simple, I'm wondering if I've missed something. My solution (so far): Let f(n) be the number of strings of length n that contain a string of length d at least once. I've found that f(n)=r^n(1+h(n)), where h(n+d)-h(n+d-1)+h(n)=0, and h(n) = -1 for n4, it gets harder, since the recurrence for h has the characteristic equation x^d-x^(d-1)+1=0, which probably isn't solvable in general (is it?). Is it possible to get any farther? What's been written on this problem? I tried googling for "probability" and "substring", but that didn't yield anything.

 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First
 Page 1 of 1 [8 Posts]
 The time now is Sun Apr 21, 2019 12:54 am | All times are GMT
 Jump to: Select a forum-------------------Forum index|___Science and Technology    |___Math    |   |___Research    |   |___num-analysis    |   |___Symbolic    |   |___Combinatorics    |   |___Probability    |   |   |___Prediction    |   |       |   |___Undergraduate    |   |___Recreational    |       |___Physics    |   |___Research    |   |___New Theories    |   |___Acoustics    |   |___Electromagnetics    |   |___Strings    |   |___Particle    |   |___Fusion    |   |___Relativity    |       |___Chem    |   |___Analytical    |   |___Electrochem    |   |   |___Battery    |   |       |   |___Coatings    |       |___Engineering        |___Control        |___Mechanics        |___Chemical

 Topic Author Forum Replies Last Post Similar Topics Probability Question dumont Probability 0 Mon Oct 23, 2006 3:38 pm probability gorbag Probability 0 Mon Aug 14, 2006 11:06 pm Possible to Find the Clusters One by One?? mathlover num-analysis 0 Thu Jul 20, 2006 7:10 pm Possible to Find the Clusters One by One?? mathlover Math 0 Thu Jul 20, 2006 7:04 pm Probability of attaining a minimum value when rolling dice nick@blackmarble.co.uk Math 16 Tue Jul 18, 2006 2:57 pm