|
|
| Author |
Message |
zartan2k@comcast.net science forum beginner
Joined: 23 Jun 2006
Posts: 3
|
Posted: Fri Jun 23, 2006 10:38 pm Post subject:
Sequence prediction problem
|
|
|
I would like some advice on how to solve the following problem. I
observe a system that typically sends a series of data to me as
follows.
1,2,3,4,5,6,1,2,3,4,5,6,1,2, ...
Only the values 1,2,3,4,5 and 6 can show up in the series. And in the
absence of any disturbances the numbers are always sequential. However
on occasion one or more of these numbers can drop out of the series.
For example:
1,2,3,5,6,1,2,3, ... (the 4 was dropped)
Another form of disturbance that can occur is that a random number
(with values from 1 to 6) can also be interjected into the series. For
example:
1,2,3,4,1,5,6,1,2,3, ... (a 1 was randomly inserted)
Note that both forms of disturbance can occur at the same time:
1,2,3,4,1,6,1,2,3, ... (a 5 was dropped and a 1 was randomly inserted)
What I would like to do is label each data point in a series with an
approximate probability that the given number was randomly inserted
into the data stream. Note that the length of these sequences will grow
over time and that the probabilities can be updated as new numbers are
observed. The starting point of a sequence doesn't have to be 1. There
is no additional information available about the system (e.g.
probablilities of drop outs or insertions).
Thanks,
zartan2k |
|
| Back to top |
|
 |
dave@autobox.com science forum beginner
Joined: 16 Feb 2006
Posts: 12
|
Posted: Thu Jul 06, 2006 3:47 pm Post subject:
Re: Sequence prediction problem
|
|
|
zartan2k@comcast.net wrote:
| Quote: | I would like some advice on how to solve the following problem. I
observe a system that typically sends a series of data to me as
follows.
1,2,3,4,5,6,1,2,3,4,5,6,1,2, ...
Only the values 1,2,3,4,5 and 6 can show up in the series. And in the
absence of any disturbances the numbers are always sequential. However
on occasion one or more of these numbers can drop out of the series.
For example:
1,2,3,5,6,1,2,3, ... (the 4 was dropped)
Another form of disturbance that can occur is that a random number
(with values from 1 to 6) can also be interjected into the series. For
example:
1,2,3,4,1,5,6,1,2,3, ... (a 1 was randomly inserted)
Note that both forms of disturbance can occur at the same time:
1,2,3,4,1,6,1,2,3, ... (a 5 was dropped and a 1 was randomly inserted)
What I would like to do is label each data point in a series with an
approximate probability that the given number was randomly inserted
into the data stream. Note that the length of these sequences will grow
over time and that the probabilities can be updated as new numbers are
observed. The starting point of a sequence doesn't have to be 1. There
is no additional information available about the system (e.g.
probablilities of drop outs or insertions).
Thanks,
zartan2k
|
Z.
I would like to commendate you on the clarity of your problem. I have
waited until now to see if there were any other posters , but it
appears not. The problem you have is pattern recognition in time
series.
We have been developing statistical application software that focuses
on model identification in the presence of anomalies. Please see
http://www.autobox.com and use the Google Search button for the term
"outlier".
The problem is that you can't catch an outlier without a model (at
least a mild one) for your data. Else how would you know that a point
violated that model? In fact, the process of growing understanding and
finding and examining outliers must be iterative. This isn't a new
thought. Bacon, writing in Novum Organum about 400 years ago said:
"Errors of Nature, Sports and Monsters correct the understanding in
regard to ordinary things, and reveal general forms. For whoever knows
the ways of Nature will more easily notice her deviations; and, on the
other hand, whoever knows her deviations will more accurately describe
her ways."
Some analysts think that they can remove outliers based on abnormal
residuals to a simple fitted model sometimes even "eye models". If the
outlier is outside of a particular probability limit (95 or 99), they
then attempt to locate if there is something missing from model. If
not, it's gone. This deletion or adjustment of the value so that there
is no outlier effect is equivalent to augmenting the model with a 0/1
variable where a 1 is used to denote the time point and 0's elsewhere.
This manual adjustment is normally supported by visual or graphical
analysis ... which as we will see below often fails. Additionally this
approach begs the question of "inliers" whose effect is just as serious
as "outliers" . Inliers are " too normal or too close to the mean" and
if ignored will bias the identification of the model and its
parameters. Consider the time series 1,9,1,9,1,9,5,9 and how a simple
model might find nothing exceptional whereas a slightly less simple
model would focus the attention on the exceptional value of 5 at time
period seven.
Your problem is IMHO in the same "ballpark". Whereas our software
identifies and treats the anomlies, you it appears want the identified
missing value to be inserted ans assigned a very smal (0.0) probability
of having been put there by the original system i.e. the observed
values.
You can pusue threads like Intervention Detection, ARIMA , Box-Jenkins,
Signal Detection , Data Mining etc and all will find their solution at
http://www.autobox.com.
If you would like to chat , please call
Dave Reilly
Automatic Forecasting Systems
http://www.autobox.com
215-675-0652 in the US |
|
| Back to top |
|
 |
Google
|
|
| Back to top |
|
 |
|
|
The time now is Sat Jan 10, 2009 3:43 am | All times are GMT
|
|
WesternUnion | Loans | Birthday Gifts | Credit Cards | Bankruptcy
|
|
Copyright © 2004-2005 DeniX Solutions SRL
|
|
Other DeniX Solutions sites:
Electronics forum |
Medicine forum |
Unix/Linux blog |
Unix/Linux documentation |
Unix/Linux forums
|
Powered by phpBB © 2001, 2005 phpBB Group
|
|