A.G.McDowell
Joined: 17 Mar 2005
Posted: Tue Jun 27, 2006 5:40 pm    Post subject: Re: Sequence prediction problem

zartan2k@comcast.net <zartan2k@comcast.net> writes
 Quote: Bruce Reistle wrote: zartan2k@comcast.net> wrote in message news:1151102986.055374.288070@y41g2000cwy.googlegroups.com... I would like some advice on how to solve the following problem. I observe a system that typically sends a series of data to me as follows. 1,2,3,4,5,6,1,2,3,4,5,6,1,2, ... Only the values 1,2,3,4,5 and 6 can show up in the series. And in the absence of any disturbances the numbers are always sequential. However on occasion one or more of these numbers can drop out of the series. For example: 1,2,3,5,6,1,2,3, ... (the 4 was dropped) Another form of disturbance that can occur is that a random number (with values from 1 to 6) can also be interjected into the series. For example: 1,2,3,4,1,5,6,1,2,3, ... (a 1 was randomly inserted) Note that both forms of disturbance can occur at the same time: 1,2,3,4,1,6,1,2,3, ... (a 5 was dropped and a 1 was randomly inserted) What I would like to do is label each data point in a series with an approximate probability that the given number was randomly inserted into the data stream. Note that the length of these sequences will grow over time and that the probabilities can be updated as new numbers are observed. The starting point of a sequence doesn't have to be 1. There is no additional information available about the system (e.g. probablilities of drop outs or insertions). Thanks, zartan2k This sounds like a fun problem. Is this a totally fictitious problem? If not, could you send me some of the data, or generate some for me? Bruce R This is a real-world problem. I don't have any actual data as yet, but I may be simulating some in the near future. Have you thought about any general approaches as yet? It seems that as long as drop-outs remain releative low, that something can be done. It's not clear to me however how one would detect when the data stream becomes too degraded, via drop-outs, to allow reasonable prediction. -zartan2k If you just want a general approach, I would look up Hidden Markov

Models and the EM algorithm. There is a hidden model with 6 states, 1,
2, 3, 4, 5, 6. It usually goes from state n to state n+1 mod 6, but
sometimes skips a state and sometimes moves forward two states. You
usually observe the hidden state, but sometimes you get junk instead.
See e.g. exercise (7) at http://www.inference.phy.cam.ac.uk/mackay/itila
/ExtraExercises.html or http://www.cis.hut.fi/ahonkela/dippa/node36.html
--
A.G.McDowell
zartan2k@comcast.net
Joined: 23 Jun 2006
Posted: Mon Jun 26, 2006 10:21 pm    Post subject: Re: Sequence prediction problem

Bruce Reistle wrote:
 Quote: zartan2k@comcast.net> wrote in message news:1151102986.055374.288070@y41g2000cwy.googlegroups.com... I would like some advice on how to solve the following problem. I observe a system that typically sends a series of data to me as follows. 1,2,3,4,5,6,1,2,3,4,5,6,1,2, ... Only the values 1,2,3,4,5 and 6 can show up in the series. And in the absence of any disturbances the numbers are always sequential. However on occasion one or more of these numbers can drop out of the series. For example: 1,2,3,5,6,1,2,3, ... (the 4 was dropped) Another form of disturbance that can occur is that a random number (with values from 1 to 6) can also be interjected into the series. For example: 1,2,3,4,1,5,6,1,2,3, ... (a 1 was randomly inserted) Note that both forms of disturbance can occur at the same time: 1,2,3,4,1,6,1,2,3, ... (a 5 was dropped and a 1 was randomly inserted) What I would like to do is label each data point in a series with an approximate probability that the given number was randomly inserted into the data stream. Note that the length of these sequences will grow over time and that the probabilities can be updated as new numbers are observed. The starting point of a sequence doesn't have to be 1. There is no additional information available about the system (e.g. probablilities of drop outs or insertions). Thanks, zartan2k This sounds like a fun problem. Is this a totally fictitious problem? If not, could you send me some of the data, or generate some for me? Bruce R

This is a real-world problem. I don't have any actual data as yet, but
I may be simulating some in the near future. Have you thought about any
general approaches as yet? It seems that as long as drop-outs remain
releative low, that something can be done. It's not clear to me however
how one would detect when the data stream becomes too degraded, via
drop-outs, to allow reasonable prediction.

-zartan2k
Bruce Reistle
Joined: 25 Jun 2006
Posted: Sun Jun 25, 2006 7:34 pm    Post subject: Re: Sequence prediction problem

<zartan2k@comcast.net> wrote in message
 Quote: I would like some advice on how to solve the following problem. I observe a system that typically sends a series of data to me as follows. 1,2,3,4,5,6,1,2,3,4,5,6,1,2, ... Only the values 1,2,3,4,5 and 6 can show up in the series. And in the absence of any disturbances the numbers are always sequential. However on occasion one or more of these numbers can drop out of the series. For example: 1,2,3,5,6,1,2,3, ... (the 4 was dropped) Another form of disturbance that can occur is that a random number (with values from 1 to 6) can also be interjected into the series. For example: 1,2,3,4,1,5,6,1,2,3, ... (a 1 was randomly inserted) Note that both forms of disturbance can occur at the same time: 1,2,3,4,1,6,1,2,3, ... (a 5 was dropped and a 1 was randomly inserted) What I would like to do is label each data point in a series with an approximate probability that the given number was randomly inserted into the data stream. Note that the length of these sequences will grow over time and that the probabilities can be updated as new numbers are observed. The starting point of a sequence doesn't have to be 1. There is no additional information available about the system (e.g. probablilities of drop outs or insertions). Thanks, zartan2k

This sounds like a fun problem. Is this a totally
fictitious problem? If not, could you send me some of the
data, or generate some for me?

Bruce R
zartan2k@comcast.net
Joined: 23 Jun 2006
 Posted: Fri Jun 23, 2006 10:49 pm    Post subject: Sequence prediction problem I would like some advice on how to solve the following problem. I observe a system that typically sends a series of data to me as follows. 1,2,3,4,5,6,1,2,3,4,5,6,1,2, ... Only the values 1,2,3,4,5 and 6 can show up in the series. And in the absence of any disturbances the numbers are always sequential. However on occasion one or more of these numbers can drop out of the series. For example: 1,2,3,5,6,1,2,3, ... (the 4 was dropped) Another form of disturbance that can occur is that a random number (with values from 1 to 6) can also be interjected into the series. For example: 1,2,3,4,1,5,6,1,2,3, ... (a 1 was randomly inserted) Note that both forms of disturbance can occur at the same time: 1,2,3,4,1,6,1,2,3, ... (a 5 was dropped and a 1 was randomly inserted) What I would like to do is label each data point in a series with an approximate probability that the given number was randomly inserted into the data stream. Note that the length of these sequences will grow over time and that the probabilities can be updated as new numbers are observed. The starting point of a sequence doesn't have to be 1. There is no additional information available about the system (e.g. probablilities of drop outs or insertions). Thanks, zartan2k

