Systematic Observation: Examples, Strengths, Weaknesses

systematic observation examples and definition

Systematic observation is a highly structured method of observational research that occurs through the structured observation and coding of a research phenomenon. In this method, one or more trained observers record the target behavior(s) using a predefined coding system to minimize bias.

The behavior can be observed in real time, or recorded and coded at a later time. The key to conducting scientifically valid systematic observation is to ensure that the raters are well-trained and seek to minimize bias.

Therefore, raters often go through an extensive training process where the observation categories are clearly defined and raters engage in numerous practice sessions.

To ensure that raters are as unbiased as possible, they should not be informed as to the hypotheses or purpose of the study, if possible.

Systematic observation is used in psychology, educational research, anthropology, sociology, and business research.

Systematic Observation Examples

1. Linguistic Development Of Children

Research on language development of children is often conducted with systematic observation.

Typically, observers will be trained for several weeks on how to code the statements that parents direct towards their children.

Sometimes observations will take place in a family’s home during dinner, in a classroom to study the effects of teachers, or in the lab while the parent and child are engaged in free-play or a structured activity.

The behavior of the participants is either coded in real-time as they are occurring, or video or audio recorded and used later for coding.

After data is collected, statistical software is used to identify patterns of parental behavior that are associated with language development.

Although this type of study yields valuable findings, conducting these systematic observations is incredibly time-consuming.

2. Baumrind’s Parenting Styles Observation

The quality of the data collected is crucial to the scientific validity of the study. Therefore, great care is taken when selecting personnel to conduct the observations. There are many factors to consider which may or may not affect the ratings.

For example, when Baumrind (1971) conducted one of her early studies on parenting styles, the observers were placed in the homes of each family. Data were obtained during two home visits, each one lasting 3 hours.

“The seven observers differed considerably in age, philosophical persuasion, and professional training. All observers had previous experience in observing and rating children and at least some graduate training in the behavioral sciences. Observers were chosen for their heterodox views and life experience in order to reduce the subjectivism inherent in observational methods” (p. 5).

Because ethnicity and cultural background may affect perceptions of parenting styles, Baumrind attempted to select trainers that were from diverse backgrounds and life experience.

3. Playground Observation

Owens et al. (2008) evaluated the effectiveness of the Social Use of Language Programme (SULP) and the LEGO(R) therapy for improving the social skills of children ages 6-11 years old with high functioning autism or Asperger Syndrome.

Children were randomly assigned to one of the two therapy groups or a control condition. Therapy took place for 1 hour each week over 18 weeks.

In addition to several other measures, some children were also observed on the school playground during break times using a systematic playground observation coding scheme.

“The frequency of self-initiated social contact with peers and the duration of social interactions with peers were measured to gain an overall indication of social functioning that was practical to
observe in the playground”
(p. 1951).

Although there were no statistically significant differences between the two therapy groups before and after therapy, the data showed:

“…a trend for the LEGO group to improve more in the mean duration of social interactions than the SULP group and…a significant increase in duration of interactions for the LEGO group but not the SULP group” (p. 1953-1954).

4. The LENA System Software

LENA stands for Language Environment Analysis System. It is an automated recording technology used to study language acquisition in typical or special populations. 

The system includes a portable digital language processor (DLP) that the research participant wears with a specially designed t-shirt or vest. Recordings are stored in the hardware device and then uploaded by the researcher to the LENA website.

The software categorizes the speakers’ words into: adult word count (AWC), child vocalization count (CVC), and conversational turn count (CTC).

Ganek & Eriks-Brophy (2016) state that “The LENA System is capable of providing information on a variety of different aspects of a child’s auditory environment” (p. 5).

However, there are shortcomings. For instance, the system does not provide a transcript of the speech. This means that researchers must transcribe the recordings by hand.

Unfortunately, when there is overlapping speech, the software deletes both utterances.

“In busy homes with large families, discarding overlapping speech would likely underestimate the true number of interactions that occurred. Similar issues may also impact LENA results obtained in classroom settings” (p. 5).

Despite the limitations, the LENA system offers an additional method of collecting quantitative data in a child’s natural environment and avoids the possible biases of trained observers.

5. Coding On-Task and Off-Task Classroom Behavior

The above video explains in very nice detail how a coding scheme is implemented in a first-grade classroom setting. The narrator first points out that the name of the student being observed is not visible. Anonymity or confidentiality is a required assurance in most all studies involving human participants.

The narrator then describes different parts of the coding scheme, identifies the time interval of each observation, and explains what different terms mean. As stated, she makes note of the target child’s behavior and the behavior of peers every 10 seconds.

She circles the term on the coding sheet that represents that child’s actions. In this example, on-task and off-task is one of the primary behaviors being observed.

It should be noted that there is a large empty space on the coding sheet for the observer to write notes. Those notes can describe the actions of the teacher, elements of instruction, or any unusual events that occurred during the observation.

6. Usability Testing In Product Design

Even the most well-thought-out design can have flaws. Many of those flaws cannot be anticipated during the design stage. This is why usability testing is a crucial step in the design process.

Usability testing is the process of evaluating a product with actual users. There are numerous methods, such as testing in the field (called guerilla testing), in a laboratory setting, or remotely.

Members of the design team will observe and take notes of how users interact with the product. Sometimes those observations are informal, but sometimes they are highly structured.

There may be specific issues that the design team is concerned about and therefore they focus their observations on those issues. This sometimes involves a testing facilitator asking the participants to attempt specific tasks and taking notes regarding their experience.

Those observations, follow-up interviews, and focus groups will help the team identify any problem issues. Then, the team can implement any necessary design changes to improve the product’s usability.

7. Group Dynamics During Office Meetings

Although a lot of employees despise meetings, they can be highly effective if carried out properly. Unfortunately, a lot of time during meetings is wasted with off-task activity.

When a company is concerned about meetings being too unproductive, they may hire an outside consultant to perform a group dynamics observation. This involves trained observers watching a selected group of meetings and making systematic notes on what transpires.

Research has revealed that each person plays a distinct functional role such as: the task-master, the harmonizer, the compromiser, the help-seeker, and the dominator.

By systematically observing the functional statements that occur between team members, the consultant can inform the company of ways to improve meeting efficiency.

8. The Ainsworth Strange Situations Test

Dr. Mary Ainsworth has conducted research on the attachment styles of infants and children for decades. In order to assess this dynamic between mother and infant, she created the Strange Situations Test, which consists of 8 scenarios in a laboratory setting.

The mother and child’s behavior are carefully observed during each scenario by a trained member of the research team. The observers receive extensive training and code the child’s actions according to a detailed scoring system.

In one scenario, the mother leaves the child alone in the laboratory room and later returns. The observers are interested in how the child responds when the mother returns. Some children will engage in affection sharing, while others will avoid the mother’s display of affection all together.

In order for the study to be valid, it is imperative that observers agree on what they see. This is referred to as inter-rater reliability.

Inter-rater agreement for SSP is high, especially among within-laboratory researchers and in a lesser extent but still reassuring when inter-laboratory rates are examined” (Simonelli & Parolin, 2016, p. 4).

9. Judging Synchronized Swimming

Systematic observations also occur in some athletic competitions. For example, in synchronized swimming, performance is observed and scored by a panel of highly trained judges.

In some cases, more than 20 judges can be involved. The stakes can be quite high, where winning or losing can have profound ramifications on an athlete’s future and financial well-being.

Therefore, it is essential that judges all use the same criteria when scoring an athlete’s performance. If there is substantial disagreement among the judges, then the competition will be invalid.

Research conducted by Ponciano et al. (2017) investigated the viability of using video recorded performances as a training tool for judges.

They recorded the routines of three experienced synchronized swimmers and had their performances rated by ten experienced judges on two separate occasions.

The judges’ ratings showed a high degree of similarity between T1 (0.85) and T2 (0.83). In the words of the researchers: “The content of the video was interpreted almost the same way by the 10 evaluators and allowed evaluation consistency after 7 days” (p. 185).

10. Bandura Bobo Doll Study

Maybe one of the most famous studies in all of psychology is the Bandura Bobo Doll study. Conducted by Dr. Albert Bandura and colleagues in the 1960s, the study demonstrated the dangerous effects of televised violence and sparked a nationwide debate in the U. S.

The study involved primary school children watching a video of an adult playing in a room. In some conditions, the adult played aggressively with an inflatable Bobo doll. The adult hit the doll, struck it with a mallet, and drop-kicked it across the room.

In other conditions, the adult played non-aggressively with the doll.

The children were then taken to another room where there happened to be a Bobo doll. Trained members of the research team then conducted observations of each child’s behavior and counted the number of aggressive actions each took.  

The coding scheme included: physical aggression, verbal aggression, aggression toward inanimate objects, and aggressive inhibition.

The raters’ scores were compared and found to be highly consistent with each other. This is essential if the study’s results are to be considered valid.

Read Also: Inference vs Observation (What’s the Difference?)

Systematic Observation Strengths

1. Excellent for Behavior Assessment

Unlike other methods, systematic observation relies on first-hand third-person observation rather than relying on participant self-reporting.

Many studies in psychology and the social sciences rely on paper-and-pencil measures. When assessing personality traits or attitudes, research participants are often asked to fill out questionnaires or surveys.

However, when conducting a study that utilizes systematic observation, the researchers are measuring actual behavior.

So, instead of an individual indicating what they think they might do in a certain situation, their behavior is actually observed in that situation.

This makes the research more scientifically sound.

2. Excellent Method for Field Research

Similar to the strength mentioned above, systematic observation is often conducted in real settings. Although it can be utilized in a laboratory setting, it can also take place in the research participants’ natural environment.

For example, studies that investigate the influence of parental statements or discipline style on children’s development can take place in the child’s home. This setting is far more natural than a laboratory setting, which may inhibit the participants’ behavior.

3. It Can be a Highly Focused Method

When conducting systematic observations, it is possible to focus on highly specific behaviors.

Researchers are given an opportunity to define their target constructs in great detail, including defining which behaviors do not meet the operational definition of the construct under study.

This degree of specificity improves the validity of the study, which allows researchers and consumers of the research to have confidence in the results.

Systematic Observation Weaknesses

1. Potential for Observer Bias

Systematic observation, in most cases, relies on human beings to conduct the observations. Whenever people are involved, there is an opportunity for personal biases and cultural dynamics to cloud judgment.

Although thorough training of observers and pilot testing can help ensure the validity of the ratings, observers can also exhibit what is called “drift.”

This is when raters gradually drift away from the defined constructs and start to rely on their own opinions. Drift can be an especially problematic matter when research takes place over many months.

2. It’s Time-Consuming

Systematic observation is incredibly time-consuming. Even before one data point has been collected, the raters must go through an extensive training process.

Even before that training begins, the researchers must spend considerable time developing a detailed scoring system.

Once the training has been completed, data collection can take considerable time and effort. If conducted in the field, the raters must travel to and from the designated locations. If conducted in the lab, often only one research participant is observed at a time.

Therefore, a key disadvantage of systematic observation is the sheer amount of time and energy that must be exerted to complete a single study.

3. The Hawthorne Effect

The Hawthorne Effect refers to the fact that research participants will sometimes change their behavior because they know they are being observed. This can cause a serious detriment to a study’s validity.

The goal of research is to study authentic human behavior. However, if participants inhibit their actions or alter their behavior to appear more favorable, then the results of the study become invalid.


Systematic observation is an incredibly diverse method of collecting data. It can be applied in a natural setting such as a classroom, public area, or home environment. This means researchers can study human behavior as it occurs in natural settings.

Systematic observation has been utilized in psychological studies on parenting, language acquisition, attachment, and the effects of televised violence. Other meaningful applications include assessing the consistency of judges, the usability of products, and the interaction patterns in workplace meetings.


Ainsworth, M. D. S., Blehar, M., Waters, E., & Wall, S. (1978). Patterns of attachment: A psychological study of the Strange Situation. Hillsdale: Erlbaum.

Bandura, A., Ross, D. & Ross, S.A. (1961). Transmission of aggression through imitation of aggressive modelsJournal of Abnormal and Social Psychology, 63, 575-82.

Baumrind, D. (1971). Current patterns of parental authority. Developmental Psychology, 4, 1–103.

Hillary Ganek and Alice Eriks-Brophy. 2016. The Language ENvironment Analysis (LENA) system: A literature review. In Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition, pages 24–32, Umeå, Sweden. LiU Electronic Press.

Owens, G., Granader, Y., Humphrey, A., & Baron-Cohen, S. (2008). LEGO((R)) therapy and the Social Use of Language Programme: An evaluation of two social skills interventions for children with high functioning autism and Asperger Syndrome. Journal of Autism and Developmental Disorders, 38, 1944-57.

Cohen, R. J., & Swerdlik, M. E. (2005). Psychological testing and assessment: An introduction to tests and measurement (6th ed.). New York: McGraw-Hill.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334.

Cronbach, L. J. (2004). My current thoughts on coefficient alpha and successor procedures. Educational and Psychological Measurement, 64(3), 391-418.

Simonelli, Alessandra & Parolin, Micol. (2016). Strange Situation Test. Virgil Zeigler-Hill and Todd K. Shackelford (Eds.) In Encyclopedia of Personality and Individual Differences (pp.1-4).

Ponciano, K., Fugita, M., Figueira, A., da Silva, C., Meira, C., & Bocalini, Do. (2017). Reliability of judge’s evaluation of the synchronized swimming technical elements by video. Revista Brasileira de Medicina do Esporte, 24, 182-185.

Website | + posts

Dr. Cornell has worked in education for more than 20 years. His work has involved designing teacher certification for Trinity College in London and in-service training for state governments in the United States. He has trained kindergarten teachers in 8 countries and helped businessmen and women open baby centers and kindergartens in 3 countries.

Website | + posts

This article was peer-reviewed and edited by Chris Drew (PhD). The review process on Helpful Professor involves having a PhD level expert fact check, edit, and contribute to articles. Reviewers ensure all content reflects expert academic consensus and is backed up with reference to academic studies. Dr. Drew has published over 20 academic articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education and holds a PhD in Education from ACU.

Leave a Comment

Your email address will not be published. Required fields are marked *