View Single Post
Old 2014-05-18, 09:09   Link #7508
Join Date: Nov 2006
Location: Australia
Originally Posted by oompa loompa View Post
Well yes, there is a reliable sample size, depending on the amount of variables, and the distribution it comes from, because you fix the probability of precision, (usually 95%, I don't remember if that's just convention or if it maximizes power somehow), so your gains from increasing sample size beyond a point decrease dramatically.

Also, since you calculate your SD from your variance, it is again important to have a large sample size because variance IS something you can test statistically, not with a p-value, but still. In this example, the number of variables isn't given, but if it's more than 3 I can definitely say that 3 is not enough. Besides, SD is a useless statistic without a sample mean (which I'm assuming is also being calculated), I'm a little rusty (and generally stats is not my strong suit) but since the sample mean is derived from a normal distribution by the CLT, there is going to be an associated t-statistic and p-value.

It's not misleading if you're not trying to mislead anyone. It is misleading if the claim is that an SD with 3 data points (and less than 3 variables) is a good predictor for what the deviations of the results of future experiments will be. For example, if you get a result greater than x number of SD's for your 4th experiment, how will you know if something went wrong or right? Forget SD, if you don't have a reliable mean, how will you know whether your result was close to what the 'usual' result is or not? If you don't have a reliable variance, how will you know how much deviation is acceptable without being called an abnormality?

At the same time, I agree with you that 3 is better than nothing. Its always going to help, but one would have to say that the results are merely indicative, and not a reliable predictor for how future experiments will pan out.
Sorry guys were, out of it till now.

Of course in an ideal world, we will like to set out a survey of 100 samples with 100 duplicates, but general we don't have enough time and human resources to follow up with that.

What I did was basically set up an experiment which from 10 data point, to produce a trend (something linear, like interviewing 10 people to link the coffee consumption with ages for example). I generally agree that a repeat have to be done to make sure that the results will be accurate.

But when it comes to data analysis, what i think we have to do is: repeat the experiment, and then set up a trend based on the 20 data points (10+10), and draw a line on that, taking the R^2 as variance

My boss meanwhile want to repeat the experiment three times (interview 10 people each), draw a trend, get the slope each time, then do standard deviation(SD) for those 3 just to get SD, which i found really absurd. I means I saw guys in my office even try to put in that +/- sign for his data using only 2 data points before, just to suit my boss taste.
risingstar3110 is offline   Reply With Quote