
Making a Plan
Whilst exhaustively exploring or 'mining' data for features, trends and relationships
is good, you will rarely have adequate time or expertise (without a professional statistician
to hand). Having a plan, complete with checklist, ensures that you collect and analyse
the information that you really need. Part 2 of your plan is to select key areas from
your initial analysis that look to be important for a second level analysis, and then
produce a similar plan for this and conduct it. The plans can be brief.
Levels of data: Level defines what you can do with it
It is important to decide what 'levels' of data you are measuring because this
will affect your analysis. It is usually best to collect data at the highest practicable
level. For instance, individual ages (at a high level) rather than age groups (at
a lower level). You can do more with individual ages than you can with age groups.
You can always reduce it later by grouping, but you cannot return to explore in detail
what a grouping alone doesn’t tell you. (E.g. if you see in age group 12 – 25 something
important, you need to find which specific range of ages and an age group figure won’t
give you that.)
Types of data
All data can be classified as either categorical (i.e. it refers to categories,
e.g. robbery, theft, murder) or continuous (time of day, weight). Within each of these
two categories there are levels of data. According to which type of data you have,
you can perform different styles of analysis.
Categorical Data is divided into:
Nominal level data classifies things into categories e.g. apples, green, race,
gender etc.
Ordinal data classifies and ranks (e.g. first in a race, second, third). But
the differences between each rank may not be equal and there is no meaningful zero
point (e.g. you can’t come zero in a race.)
Continuous Data is divided into:
Interval data which classes, ranks and in addition the difference between each
rank is equal. A good example of this is attitude scales: 5 strongly disagree to 1
strongly agree (there is no zero point).
Ratio has everything interval has and a meaningful zero point (e.g. age, weight).
Back to Data Analysis
|