Arby wrote:

This is my first post to this list, my question is actually about
statistics although I'm no statistician so if I don't get the correct
terminology then forgive me but I'll try. What I want to know is, what
would be the right type of statistical modelling for
nominal/categorical data? Put another way what is the appropriate
method for predicting a nominal response variable from a number of
categorical explanatory variables?
I am actually a biologist and what I have is a dataset that records
gains and losses of regions of DNA from 400 colorectal cancer patients.
More specifically, the whole human genome is divided into 862 segments,
each segment is scored as 0 = no change, 1 = loss, 1 =gain, 2 = high
level amplification, for each of the 400 patients.
In general terms my question would be: can I predict the status of 1 of
the 862 segments if I know the status of the other 861 segments? That
would be what I think is called the maximal model, what is the process
for reducing this to the minimal number of segments necessary to
predict a given segment with a set level of certainty? Having looked
through some basic stats books the nearest thing I've found is ANOVA
but this confused as it talked about calculating means and I don't see
how you can have a mean of a nominal variable (this may be the crux of
my problem).
I'm not looking for complete solutions, I just don't have the correct
vocabulary to describe what I need so I'd be grateful if someone could
tell me what the appropriate terms are and hopefully provide some
beginners references so I can go away and learn how to do it.
regards,
Richard

This post is more germane to the newsgroup sci.stat, where I have
crossposted this reply.

