How to Determine Whether Your Sample Size is Representative of the Population
First, when you are planning a research project and you want to ensure that it is representative of the population, so that you can generalize the results, you should plan your sample selection to do so. You can refer back to all the sampling methodologies to see how to do this. Also, you should always try to sample more than you actually need to account for attrition.
So, once you have a sample of data, you will want to determine just how representative it is of the target population, for example in terms of race/ethnicity. I have had students in the past say, “Well, it looks pretty close, so I do think it is representative.” Eyeballing it is definitely NOT the way that you make decisions about the representativeness of the data!
There are two different things you can do: 1) use the sample size and confidence level to determine appropriate confidence intervals, and 2) run a simple Chi square (χ2) test for Goodness of Fit. For these demonstrations, I am using the demonstration sample (N=285) that I used for the Excel video for my sample data and I am using the US Census Quick Facts data for Harris County as my population data (https://www.census.gov/quickfacts/fact/table/harriscountytexas/PST045219). You can use the below process on other dimensions of the population (age, gender, education, income, etc.), as long as you can effectively match the variable’s categories.
Step 1: Ensure your data categories match for the variable If your data categories do not already match the population data categories, you have to fix that if you can, because you need to be comparing apples with apples. As stated earlier, the category we are using for this example is race/ethnicity. Here is the US Census Quick Facts data for Harris County for 2019:
Here is the demonstration sample data:
In order to appropriately match the categories, you must do the following:
1. Use the “White alone, not Hispanic or Latino” category from the US Census Quick Facts, not the “White alone,” because it includes Hispanic or Latino, because it is an ethnicity, not a race.
2. In the US Census Quick Facts data, combine “Asian” and “Native Hawaiian and other Pacific Islander alone”
3. We will not use the “Two or More Races” from the US Census Quick Facts and we will not use the “Other” or “No Answer” group from our sample data, which leaves us with 5 categories to compare.
4. We will be comparing the percentages. The table we will use for our analysis would look like this:
Step 2: Determine the Confidence Intervals You can use an online sample size calculator to determine the confidence intervals. I used the one at https://www.surveysystem.com/sscalc.htm (see below).
n %Caucasian/White 63 22.1African American/Black 71 24.9Native American 6 2.1Hispanic/Latino 109 38.2Asian/Pacific Islander 21 7.4Other 7 2.5No Answer 8 2.8
285 100
Census Sample % %
African American/Black alone 20 24.9American Indian/Alaska Native 1.1 2.1Asian/NativeHawaiian/Pacific Islander 7.4 7.4Hispanic/Latino 43.7 38.2White alone (Not Hispanic/Latino) 28.7 22.1
You will use the grey box entitled, “Find Confidence Interval.” It is defaulted at a confidence level of 95%, which is fine for our purposes. Since my sample size is 285 participants, I am going to type in “285” in the Sample Size box and I am going to set my Percentage at “50”. Then I am going to click on the “Calculate” button and the below is what I will get:
It states that my Confidence Interval value is 5.81. The is the +/- number that you use to determine the confidence interval (also referred to as the margin of error). We’ll add two additional columns to our table: 1) the first new column will be the lower limit of the confidence interval, which we will get by subtracting 5.81 from our sample value, and 2) the second new column will be the upper limit of the confidence interval, which we will get by adding 5.81 to our sample value (see below).
This means that we can expect that if we were able to survey the entire population, that the real value in our population for African American/Black alone would be between 19.1% and 30.7% and as you can see, our real Census value of 20, does lie within that range. The same thing applies to each of the other categories, with the exception of the “White alone (Not Hispanic/Latino)” category, which has the Census value a little less than 1% above the upper limit of the confidence interval, which means our sample was a little low in that category. Consequently, it appears that our sample size was likely sufficient. However, we really need to take one more step to understand the representativeness of our sample of the population.
A B C D E1 Census Sample Confidence Interval2 % % LL UL3 African American/Black alone 20 24.9 19.1 30.74 American Indian/Alaska Native 1.1 2.1 -3.7 7.95 Asian/NativeHawaiian/Pacific Islander 7.4 7.4 1.6 13.26 Hispanic/Latino 43.7 38.2 32.4 44.17 White alone (Not Hispanic/Latino) 28.7 22.1 16.3 27.9
Step 3: Run a Chi Square Test for Goodness of Fit You can run a simple Chi Square Test in Excel. The formula is:
=CHISQ.TEST(ObservedRange,ExpectedRange)
Since the “observed range” is the sample data and the “expected range” is the US Census data, our formula for the above table would be:
=CHISQ.TEST(C3:C7,B3:B7)
Running this formula on our table from Step 2, produced a test statistic of 0.365. Now we have to refer to a Chi Square Distribution Table. Down the left of the table are the degrees of freedom. The degrees of freedom are the number of categories minus 1 (n-1). Since our race/ethnicity variable has 5 categories, our degrees of freedom will be 4 (df=4). We said at the outset that we would use the 95% Confidence Level, so we will use the “0.05” column. Where the df=4 intersects with the 0.05 (which is 9.488) is our critical value. Since our test statistic (0.365) is considerably less than our critical value (9.488), we can be confident that there is no significant difference between our observed race/ethnicity distribution (our sample distribution) and our expected race/ethnicity distribution (our real population distribution).
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more
Recent Comments