Sampling Question

3 posts / 0 new
Last post
Jcote's picture
Sampling Question

Hello all,

I've not played around with biostatistics since grad school and am interested in doing a study looking at smoking and military fitness scores. 

Ok here are the basics..... I have data on 2900 individuals age 18-59.  
667 Smoke
2233 Non-Smokers
And i have stadardized fitness scores on each person.

My first question is, is a simple student t-test appropriate to determine significance or not?

Next question is regarding sampling.  Is it approriate to perfrom the t-test using all smokers and all non-smokers or should the two populations be of similar size?   If they should be of similar size what is the best way to sample both groups?

Ideally, breaking both populations into age categories (21-30, 31-40, 41-50) would give me better data since most smokers tend to be younger and younger people have higher fitness scores.  Hope that makes sense.  

any help would be much appreciated.

Sami Tuomivaara
Sami Tuomivaara's picture


I assume here that you are interested in finding whether the fitness score averages are different in the two groups...

Student's t-test does not require equal sample sizes, but there's an underlying assumption of equal width (variance) of the fitness score distributions. You can use generalization of Student's t-test, called Welch's test which takes into account the the widths of the distributions (You should know whether the variances differ from your data already...).

If you want to broke down the populations by age, you have then more than one parameter (smoking +/-) and you need to resort to ANOVA.


Jcote's picture
Awesome.   Thank you for your

Awesome.   Thank you for your help.