Bit More About 1-Way ANOVA

November 5, 2008 by  
Filed under Bit More

Essentially, a 1-Way ANOVA is an overgrown t-test. A t-test compares two means. A 1-Way ANOVA lets you test the differences between more than two means. Like a t-test, there is only one independent variable (hence the “1-way”). It is an ANOVA because it analyzes the variance in the scores. The acrostic ANOVA stands for ANalysis Of VAriance.

In general, you can design experiments where people are re-used (within-subjects designs) or used only once (between-subjects design). The difference is all about time.

 Within-Subjects Designs

Sometimes we want to take repeated measures of the same people over time. These specialized studies are called within-subjects or repeated measures designs. Conceptually, they are extensions of the correlated t-test; the means are compared over time.

Like correlated t-tests, the advantages are that subjects act as their own controls, eliminating the difficulty of matching subjects on similar backgrounds, skills, experience, etc. Also, within-subject designs have more power (require less people to find a significant difference) and consequently are cheaper to run (assuming you’re paying your subjects).

They also suffer from the same disadvantages. There is no way of knowing if the effects of trial one wear off before the subjects get trial 2. The more trials in a study the larger the potential problem. In a multi-trial study, the treatment conditions could be impossibly confounded.

A more detailed investigation of within-subject designs is beyond the score of this discussion. For now, realize that it is possible, and sometimes desirable, to construct designs with repeated measures on the same subjects. But it is not a straight-forward proposition and requires more than an elementary understanding of statistics. So we’re going to focus on between-subjects designs.

Between-Subjects Designs

In a between-subject design, subjects are randomly assgined to groups. The groups vary along one independent variable. It doesn’t matter if you have 3 groups (high, medium and low) or ten groups or 100 groups…as long as they only vary on one dimension. Three types of cars is one independent variable (cars) with 3 groups. Ten types of ice cream can also be one independent variable: flavor.

Like an Analysis of Regression, an Analysis of Variance uses a F test. If F is equal to or larger than the value in the standard table, the F is considered significant, and the results are unlikely to be due to chance.

NOW YOU CHOOSE
    
Day9: 1-Way ANOVA
    Bit More About 1-Way ANOVA
    Even More About 1-Way ANOVA
    Calculate 1-Way ANOVA
    Practice Problems
    More Practice Problems
    Word Problems
        
Sim1        Sim2         Sim3
        Sim4        Sim5         Sim6
        Sim7        Sim8         Sim9
    Vocabulary
    Formulas
    Quiz 9
    Summary

Bit More About Advanced Procedures

November 5, 2008 by  
Filed under Bit More

Interactions can be good or bad. Some heart medications work better when given together. For example, Digoxin and calcium channel blockers go together because they work on different channels. Together they are better than each would be separately. But other heart medications (phenylpropanolamine with MAO inhibitors) can result in fast pulse, increased blood pressure, and even death. This is why we’re often warned not to mix drugs without checking with our doctor.

The ability to check how variables interact is the primary advantage of complex research designs and advanced statistical techniques. Although a 1-Way ANOVA can test to see if different levels of aspirin help relieve headaches. A factorial ANOVA can be used to test both aspirin and gender as predictors of headaches. Or aspirin, gender, time of day, caffeine, and chicken soup. Any number of possible explanations and combination of explanations can be tested with the techniques of multiple regression, MANOVA, factorial ANOVA and causal modeling.

 


NOW YOU CHOOSE:
    
Day10: Advanced Procedures
    Bit More About Advanced Procedures
    Even More About Advanced Procedures
    Basic Facts About Advanced Procedures
    Vocabulary
    Quiz 10
    Summary

Bit More About t-Tests

November 4, 2008 by  
Filed under Bit More

Assume that the t you calculated was a person. If that score is close to the mean of the t distribution, it is not significant; there are too many scores hanging around the mean to make it special. But if your calculated score is at one extreme of the distribution, this would be usual (or in stats terms: “significant”), and the relationship between your score and the t distribution might look like this:

     When subjects are randomly assigned to groups, the t-test is said to be independent. That is, it tests the impact of an independent variable on a dependent variable. The independent variable is dichotomous (yes/no; treatment/control; high/low) and the dependent variable is continuous. If significant, the independent t-test supports a strong inference of cause-effect.

     When subjects are given both conditions (both means are measures of the same subjects at different times), the t-test is said to be dependent or correlated. Because it uses repeated measures, the correlated-t is often replaced by using a regression (where the assumptions of covariance are more clearly stated).

 

NOW YOU CHOOSE:
    
Day 8: Student’s t-Test
    
Bit More About t-Test
    
Even More About t-Test
    How to Calculate t-Test
    
Practice Problems
    
More Practice Problems
    Word Problems
        
Sim1          
Sim2            Sim3
        
Sim4           Sim5            Sim6
        S
im7           Sim8            Sim9
    Basic Facts About t-Test

    Vocabulary
    Formulas
    Quiz 8
    Summary

Bit More About Probability

November 4, 2008 by  
Filed under Bit More

We base many of our decisions on probabilities. Is it likely to rain tomorrow? What is the probability of a new car breaking down? What are the chances our favor team will win their next game?

We are in search of causation. We want to know if what we see is likely to be due to chance. Or are we seeing a pattern that has meaning. So we begin by calculating the likelihood of events occurring by chance. We calculate probabilities and odd.

Probabilities and odds are related but not identical. They are easy to tell apart because probabilities are stated as decimals and odds are stated as ratios. But the big difference between them is what they compare. Probabilities compare the likelihood of something occurring to the total number of possibilities. Odds compare the likelihood of something occurring to the likelihood of it’s not occurring.

If you roll your ordinary, friendly six-sided die with the numbers 1 through 6 (one on each side), the probability of getting a specific number is .167. This is calculated by taking how many correct answers there are (1), divided by how many total possibilities (6), and expressing it in decimal form (.167). The odds of getting a specific number is how many correct answers (1), against how many incorrect answers. So the odds of rolling a 4 is 1:5…or 5:1 against you.

Let’s try another example. The odds of pulling an King out of a deck of cards is the number of possible correct answers (4), against the number of incorrect answers (48). So the odds are 4:48, which can be reduced to 1:12. The probability of pulling an ace is 4 divided by 52, which equals .077. Probabilities are always decimals, and odds are always ratios.

To calculate the probability of two independent events occurring at the same time, we multiply the probabilities. If the probability of you eating ice cream is .30 (you really like ice cream) and the probability of your getting hit by a car is .50 (you sleep in the middle of the street), the probability that you’ll be eating ice cream when you get hit by a car is .15. Flipping a coin twice (2 independent events) is calculated by multiplying .5 times .5. So the probability of rolling 2 heads in a row is .25. Rolling snake eyes (ones) on a single roll of a pair of dice has a probability of .03 (.167 times .167).

A major question in research is whether or not the data looks like chance. Does the magic drug we created really cure the comon cold or is it a random pattern that just looks like the real thing?

To answer our question, we compare things we think aren’t due to chance to those which we believe are due to chance. We know people vary greatly in performance. We all have strengths and weaknesses. So we assume that people in a control will vary because of chance, not because of what we did to them. But people in different treatment groups should vary because of the experiment, and not just because of chance. Later, we will use this procedure to compare differences between experimental groups to variation within each group. That is, we will compare between-subjects variance to error variance (within-subjects variance).

For the present, we can use the same test (Fischer’s F) to test the significance of a regression. Does the data we collected approximate a straight line? An Analysis of Regression (ANOR) tests whether data is linear. That is, it tests of the fit of the data to a straight line. It, like regression, assumes the two variables being measured are both changing. It works well for testing two continuous variables (like age and height in children) but not so well when one of the variables no longer varies (like age and height in adults).

 

NOW YOU CHOOSE:
    Day 7: Probability
    
Bit More About Probability
    Even More About Probability
    Even More About ANOR
    Calculate ANOR
    Practice Problems
    More Practice Problems
    Word Problems
       
Sim1       Sim2        Sim3
    Basic Facts About Probability
    Vocabulary
    Formulas
    Quiz 7
    Summary

Bit More About Regression

November 4, 2008 by  
Filed under Bit More

An extension of the correlation, a regression allows you to compare your data looks to a specific model: a straight line. Instead of using a normal curve (bell-shaped hump) as a standard, regression draws a straight line through the data. The more linear your data, the better it will fit the regression model.Once a line of regression is drawn, it can be used to make specific predictions. You can predict how many shoes people will buy based on how many hats they buy, assuming there is a strong correlation between the two variables.

Just as a correlation can be seen in a scatterplot, a regression can be represented graphically too. A regression would look like a single straight line drawn through as many points on the scatterplot
as possible. If your data points all fit on a straight line (extremely unlikely), the relationship between the two variables would be very linear.

Most likely, there will be a cluster or cloud of data points. If the scatterplot is all cloud and no trend, a regression line won’t help…you wouldn’t know where to draw it: all lines would be equally bad.

But if there the scatterplot reveals a general trend, some lines will obviously be better than others. In essence, you try draw a line that follows the trend but divides or balances the data points equally.

In a positive linear trend, the regression line will start in the bottom left part of the scatterplot and go toward the top right part of the figure. It won’t hit all of the data points but it will hit most or come close to them.

You can use either variable as a predictor. The choice is yours. But the results mostly likely won’t be the same, unless the correlation between the two variables is perfect (either +1 or -1). So it matters which variable is selected as a predictor and which is characterized as the criterion (outcome variable).

Predicting also assumes that the relationship between the two variables is strong. A weak correlation will produce a poor line of prediction. Only strong (positive or negative) correlations will produce accurate predictions.

 

 

NOW YOU CHOOSE:
    Day 6: Regression
    Bit More About Regression
    Even More About Regression
    Calculate Regression
    Practice Problems
    More Practice Problems
    Word Problems
        Sim1       Sim2         Sim3
        Sim4       Sim5         Sim6
        Sim7       Sim8         Sim9
    Basic Facts About Regression
    Vocabulary
    Formulas
    Quiz 6
    Summary
 

Bit More About Correlation

November 3, 2008 by  
Filed under Bit More

Now that you’ve mastered one variable, let’s add another. Everything up to now has been based on observing one dependent variable (one criterion). All we have done is to observe; we haven’t manipulated, stapled or mutilated anything; just observed.

With correlations we are going to continue that practice, we’re only observing, but we’re going to look at two variable and see how they are related to each other. When one variable changes, we want to know what happens to the other variable. In a perfect correlation, the two variable with move together. When there is no correlation, the variables will act independently of each other.

To use this simple and yet powerful method of description, we must collect two pieces of information on every person. These are paired observations. They can’t be separated. If we are measuring height and weight, it’s not fair to use one person’s height and another person’s weight. The data pairs must remain linked.

That means that you can’t reorganize one variable (how highest to lowest, for example) without reorganizing the other variable. The pairs must stay together.

Sign & Magnitude

A correlation has both sign and magnitude. The sign (+ or -) tells you the direction of the relationship. If one variable is getting larger (2, 4, 5, 7, 9) and the other variable is headed in the same direction (2, 3, 6, 8, 11), the correlation’s sign is positive. In a negative correlation, while the first variable is getting larger (2, 4, 5, 7, 9), the second variable is getting smaller (11, 8, 6, 3, 2).

The magnitude of a correlation is found in the size of the number. Correlation coefficients can’t be bigger than 1. If someone says they found a correlation of 2.48, they did something wrong in the calculation. Since the sign can be positive or negative, a correlation must be between -1 and +1.

The closer the coefficient is to 1 (either + or -), the stronger the relationship. Weak correlations (such as .13 or -.08) are close to zero. Strong correlations (such as .78 or -.89) are close to 1. Consequently, a coefficient of -.92 is a very strong correlation. And +.25 indicates a fairly weak positive correlation.

Magnitude is how close the coefficient is to 1; sign is whether the relationship is positive (headed the same same) or negative (inverse).

 

NOW YOU CHOOSE:
   
Day 5: Correlation
    Bit More About Correlation
    Even More About Correlation
    Calculate Correlation
    Practice Problems
    More Practice Problems
    Word Problems
       
Sim1            Sim2             Sim3
        Sim4            Sim5             Sim6
        Sim7            Sim8             Sim9
    Basic Facts About Correlation
    Vocabulary
    Formulas
    Quiz 5
    Summary

 

Bit More About z-Scores

October 22, 2008 by  
Filed under Bit More

The z-score indicates the distance an individual score is from the mean of a distribution. If a score is at the mean, it has a z-score of 0. Scores above the mean are positive and scores that are located below the mean are negative.

In practical terms, z-scores range from -3 to +3. A z of -3 indicates that the raw score is 3 standard deviations below the mean (at the extreme left end of the distribution). A z of 3 indicates that the raw score is at the extreme right end of the distribution.

Since z-scores are expressed in units of standard deviation, they are independent of the variable being measured. A z-score of -1.5 is one and a half standard deviations below the mean, regardless If z = .5, the score is located at one half standard deviation above the mean.

Composed of two parts, the z-score has both magnitude and sign. The magnitude can be interpreted as the number of standard deviations the raw score is away from the mean. The sign indicates whether the score is above the mean (+) or below the mean (-). To calculate the z-score, subtract the mean from the raw score and divide that answer by the standard deviation of the distribution. In formal terms, the formula is

Using this formula, we can find z for any raw score, assuming we know the mean and standard deviation of the distribution. What is the z-score for a raw score of 110, a mean of 100 and a standard deviation of 10? First, we find the difference between the score and the mean, which in this case would be 110-100 = 10. The result is divided by the standard deviation (10 divided by 10 = 1). With a z score of 1, we know that the raw score of 110 is one standard deviation above the mean for this distribution being studied.

Z-scores can be used to find an individual, standardize a distribution or set a cutoff. A z-score indicates a score’s distance from a mean, expressed in standard deviations. If a score is at the mean, z = 0. One standard deviation above the mean is indicated by z = 1. And one standard deviation below the mean is expressed as z = -1.

 

NOW YOU CHOOSE:
    Day 4: z-Score
    A Bit More About z-Scores
    Even More About z-Scores
    How To Calculate z-Scores
    Practice Problems
    Basic Facts About z-Scores
    Vocabulary
    Formulas For z-Scores
    Quiz 4
    Summary

Bit More About Dispersion

October 22, 2008 by  
Filed under Bit More

All measures of dispersion get larger when the distribution of scores is more widely varied. A narrow distribution (a lot of similar scores) has a small amount of dispersion. A wide distribution (lots of different scores) has a wide distribution. The more dispersion, the more heterogeneous (dissimilar) the scores will be.

 
Range

Range is easy to calculate. It is the highest score minus the lowest score. If the highest score is 11 and the lowest score is 3, the range equals 8.

 
Mean Absolute Deviation (MAD)

As the name suggests, mean variance (or mean absolute deviation) is a measure of variation from the mean. It is the average of the absolute values of the deviations from the mean. That is, the mean is subtracted from each raw score and the resulting deviations (called “little d’s”) are averaged (ignoring whether they are positive or negative).

 
Sum of Squares

Conceptually, Sum of Squares (abbreviated SS) is an extension of mean variance. Instead of taking the absolute values of the deviations, we square the critters (deviations), and add them up.

 
Variance

Variance of a population is always SS divided by N. This is true whether it is a large population or a small one. Variance of a large sample (N is larger than 30) is also calculated by Sum of Squares divided by N. If there are 40 or 400 in the sample, variance is SS divided by N.

However, if a sample is less than 30, it is easy to underestimate the variance of the population. Consequently, it is common practice to adjust the formula for a small sample variance. If N<30, variance is SS divided by N-1. Using N-1 instead of N results is a slightly larger estimate of variance and mitigates against the problem of using a small sample.

 
Standard deviation

This measure of dispersion is calculated by taking the square-root of variance. Regardless of whether you used N or N-1 to calculate variance, standard deviation is the square-root of variance. If variance is 7.22, the standard deviation is 2.69. If variance is 8.67, the standard deviation equals 2.94.

Technically, the square-root of a population variance is called sigma and the square-root of a sample variance is called the standard deviation. As a general rule, population measures use Greek symbols and sample parameters use English letters.

 

NOW YOU CHOOSE:
    Day 3: Dispersion

   
A Bit More About Dispersion
    Even More About Dispersion
        Range
        MAD
        Sum of Squares
        Variance
        Standard Deviation
    How To Calculate
        Range
        MAD
        Sum of Squares
        Variance
        Standard Deviation
    Formulas For Dispersion
    Practice Problems
    More Practice Problems
    Basic Facts About Dispersion
    Vocabulary
    Quiz 3
    Summary

 

Bit More About Central Tendency

October 22, 2008 by  
Filed under Bit More

Although some use case studies, naturalistic observation, and single subject studies (N=1), most research is group based. Usually, there are lots of numbers from lots of subject, all waiting to be crunched. So the first thing to do after conducting a study is to organize its data. A data matrix is a table of data. Each row holds the scores of a single subject. Each column is a different variable. The simplest data matrix has two columns: one for the ID number and one for the score. And it would have as many rows as subjects in the study.

After forming a data matrix the next step is usually to plot the data. Each variable is plotted separately: a graph for each factor being measured. Sometimes the variables are summarized in histograms (vertical bar graphs). Often the graphs are frequency distributions: overviews of the raw data. Each score is listed from lowest to highest (left to right). If more than one person has the same score, the graph points are stacked vertically.

So, if no one has the same score, the frequency distribution would look like a straight horizontal line. If everyone had the same score, it would be represented by a vertical line. If there is some variability in scores but several people with the same score, the distribution will have both width and height. The typical frequency distribution varies from left to right but most scores are in the middle. The result is a graph that looks like a mountain…or a dome…or the bottom of a bell. If frequency distributions are not “normal bell-shaped curves,” they might be positively skewed, negatively skewed, or bimodal.

 The major challenge of descriptive statistics is finding a representative of the entire group of scores. There are three major measurements of central tendency: mean, median and mode. The mean is the hypothetical balance point. If a frequency distribution was a seesaw, the mean would be the point where it balanced. The median is the middlemost scores. And the mode is the most common score (highest point of the frequency distribution.   
 

NOW YOU CHOOSE:
   Day 2: Central Tendency
   A Bit More About Central Tendency
   Even More About Central Tendency
   More Examples
      More Mean Examples

      More Median Examples
      Median Is Middle Of Distribution
      More Mode Examples
   Impact of Outlying Scores
      On The Mean
      On The Median
      On The Mode
   How To Calculate Central Tendency
      Calculating The Mean
      Calculating The Median
      When There’s No Middle-Most Score

      Calculating The Mode
   Formulas For Central Tendency
   Basic Facts About Central Tendency
   Vocabulary
   Quiz 2
   Summary

 

 

Bit More About Measurement

October 22, 2008 by  
Filed under Bit More

There are a few pre-number crunching activities in statistics. No math is required! But to do research, you must know–at least in general–what you’re trying to prove. Let’s summarize it in five questions:

1. What Are You Trying To Prove? Research begins in your head. It starts with your ideas (constructs). A theory is a collection of ideas. Theories determine the questions you ask, how they are asked, and who is studied. A good theory has CUSSIT.

2. What’s It Like In Practice? In order to test a theory, you convert it into a model.  Models differ from theories in their nature, their scope and their use. To convert a theory to a model requires operational definitions. Research (and statistics) can be described as being either descriptive or inferential. Inferential studies have a clear hypothesis.

3. Who Is Predicting Whom? In general, we believe that most variables are continuous, though they sometimes appear to be discrete. A dependent variable depends on the performance of the subjects. It is an outcome, and is usually a continuous variable. An independent variable is independent of the subjects’ control. It is something the researcher selects, manipulates or induces, and is a discrete variable. Predictors and criteria can be either continuous or discrete.

4. Who Are You Going To Study? Studies can measure an entire population or a sample. Samples can be selected in a variety of ways, including random selection, stratification, and random sampling.

5. What Do The Numbers Mean? Variables do not always use numbers in the same way. A high number on the back of a marathon runner doesn’t necessarily mean that person will run faster than one with a small number. Numbers can be  nominal, ordinal, interval, or ratio.

 

NOW YOU CHOOSE:
    Day 1: Measurement
    
A Bit More About Measurement
    Even More About Measurement 1
    Even More About Measurement 2
    Even More About Measurement 3
    Basic Facts About Measurement
    Vocabulary
    Quiz 1
    Summary