Educational Data Analysis or Learning Analytics is a major initiative in Australian schools to make use of the large amounts of data collected on students, particularly in various testing programmes. Understanding how to unpack and make use of such data to inform teaching and learning is of importance to all educators. The alternative is to allow external bodies to interpret the data and apply their own perspectives on school and classroom performance. Schools able to manage their own data, teachers taking control of the analysis of their own students, and student exerting ownership of data collect bout them, all have implications for future classroom practice. Information technologies are making it ever easier to collect vast quantities of data, but also the tools to make interpretation and meaning from the data easier. It will be the teachers and school leaders who master these technologies who gain advantage from such analysis, for their schools, careers, and student learning,
Dr Jason Zagami
To effectively teach, you need to understand your students. While much of the richness of teaching lies in the interpersonal relationships between teacher and student, it is also necessary to understand your students academically. We do this primarily through assessment, usually the measurement of student achievement of set tasks, but this is not the only measures we can make use of to better understand our students and ways to improve their learning.
One of the key criticisms of most student assessment is that it only reports after the fact, often well after, and beyond the time in which changes can be made to improve student learning. This diagram (Cromfrey, 2000) suggests the ideal timeframe for feedback and how specific this feedback should be to individual students.
Detailed feedback is however time consuming, and despite understanding the importance of timely and individualised feedback, generally beyond the capacity of even the most dedicated and experienced teachers to fully achieve. ICT however can assist, especially as more student activities are conducted on computers and online, many of the time consuming processes of testing and reporting can be automated.
Unfortunately, while a lot of data about student activity and their learning is often generated by their use of computers and online systems, this data is rarely in a format that is easily used by teachers and students to analyse and evaluate their learning and use the data to make improvements.
Learning Analytics (LA) and Data Mining (DM) are approaches to make data and the analysis of data more accessible. The key difference between the two is that LA looks at systems and contextual factors (such as classrooms or schools), while DM looks at specific variables and factors (such as demographics or test results).
While teachers and schools are currently focused on the use of LA and DM to improve test scores, both can be used to provide far more information than that, and in the future as the techniques, methods of presenting, and understanding by educators improve, LA and DM will become used everyday to provide the best possible learning experiences for students.
Learning Analytics at an academic/research level makes use of various software tools and approaches to draw understanding from sets of data:
Social network analysis (SNA) involves mapping and measuring relationships and flows between people, groups, organisations, computers, URLs, and other connected information/knowledge entities. The nodes in the network are the people and groups while the links show relationships or flows between the nodes.
Social Ecological Modelling (SEM) involves identifying and measuring influencing factors of behaviour.
Behavioral trust analysis uses instances of conversation and propagation (people communicating and using information to generate new information) as an indicator of trust.
Influence and passivity measure assesses the influence of people and information by measuring the number of times it is passed on, cited, or retweeted.
At an institutional (i.e. school or classroom) level, data is most often drawn from Learning Management System (LMS) usage and diagnostic tests such as the National Assessment Program — Literacy and Numeracy (NAPLAN). Such data is usually analysed using data tables, graphs and data visualisations.
Examples from your course
The Australian federal government, Queensland Government and Brisbane City Council. are making some data open via various websites but the coverage is patchy. Available Queensland school datasets include home education registrations, school disciplinary absences, attendance rates, and enrolments.
Open data is somewhat more problematic in schools as privacy laws and policies regarding student data restrict open sharing of some data. Nevertheless public data is available for education , particularly when aggregated and de-identified so that individual students cannot be identified (easily). MySchool aggregates data from many sources to compare schools, and demographic data is available from the Australian Bureau of Statistics. Schools are sharing more data publicly via websites but the bulk of educational data collected by teachers and schools remain closed and often unexamined.
While Google is providing tools to facilitate data sharing, and it is becoming common to share scientific data, standards are still being established and we have some way to go before the vision of an internet of connected raw data is available to drive innovation,social improvement, and of course educational improvements.
National Assessment Program - Literacy and Numeracy (NAPLAN). These tests assess students’ reading, writing, language (spelling, grammar and punctuation) and numeracy, common to all states and territories.
While ostensibly to determine if students are performing above, at or below national benchmarks, the data provided can be used for a range of purposes. Individual students receive a detailed report, but teachers and schools also receive reports on how their students have performed compared to other classes in their school, state, and nationally. This can then provide information on which to make changes to teaching programs and pedagogy.
Unfortunately, NAPLAN results can take a while to be processed and returned to schools, beyond the time in which specific interventions can be made to address the misconceptions and errors made by students. This will hopefully change in the future but the data is nevertheless useful in identifying those concepts that students did not know and allows individual teachers to reflect on their teaching of these concepts in the future.
Guttmann Space Patterns (adapted from Griffin, 2012) can be used for such tests to determine what actions teachers can take to address individual student learning needs.
First order columns of student results from Easy to Hard items, and then order students from most capable to least capable in rows.
To do this you only need some skills in using a spreadsheet to sort the data for the class according to the total number of questions a student has correct, and the total number of students that an item has with the correct answer. Then sort the data according to marginal totals, to form what is known as a contingency table (or cross tabulation). In simple terms, it is a count on the right hand side of how many questions each students answered correctly.
This sorted skills audit of test questions, both by student and by item form a pattern, called a Guttmann space pattern.
A Guttmann space pattern is a collection of ones and zeroes – one meaning the correct answer, one means they were able to demonstrate a particular skill, zero means they didn’t – and if we sort those in both directions, you get this pattern with a diagonal where above the diagonal, mostly ones, and below the diagonal, mostly zeroes.
So if you can imagine a grid of ones and zeroes with a split down the middle across the diagonal, the top of the spreadsheet would have all the students with the correct answers, those who got mostly correct answers to all the questions, and at the bottom you have the students with mostly incorrect answers to the questions. And on the left hand side you’d have most of the easy items that almost everybody answered correctly, and on the right hand side you have a pattern of had items that almost no-one answered correctly.
This splits the spreadsheet not down the middle but across the diagonal, above the diagonal will mostly be ones, and below the diagonal, mostly zeroes, but it is rarely a perfect diagonal (as with James in the example). As you move from left to right across the spreadsheet... typically a student will get a few questions all correct, and then there will be some right, some wrong, and then mostly wrong.
The interesting thing about doing this analysis is the area where there are some right and some wrong answers. This is the area that Vygotsky called the zone of proximal development for each student, and that’s where the student is most ready to learn, and where teaching intervention will most likely succeed.
But the problem with the NAPLAN data is that it comes out four or five months after students do the test, so the data is dated, but you can make an assumption that the student has only moved on a little bit, and so still use the Guttman space analysis to identify for each student in the class where is the zone of proximal development is in which teachers can intervene and teach to the construct rather than going through every question on the test and drilling them in all questions they answered incorrectly.
If a question is so difficult that it is a long way to the right of the diagonal for a student, it is beyond their ability to learn how to do that skill, if it’s a long way to the left of the diagonal it is way below their ability and they will be bored.
With this simple skill to analyse test questions, teachers can interpret and analyse test data to identify where to intervene as a result of a test, identifying the ZPD for individual students. This analysis does not point to a level of achievement that the student has reached, instead it points to where the student is most likely to benefit from instruction.
For tests when data is delayed (such as NAPLAN results) all teachers need to do is make an assumption that the student will have moved only a little bit to the right of the diagonal and they they can check the data to see whether that is true (by checking student understanding) and teach the student more effectively.
This approach, of using test data to identify areas for improvement is what testing programs such as NAPLAN are intended to achieve, it is only when teachers and schools focus on average scores rather than testing being part of a developmental approach, that problems arise.
“it’s not about fixing up problems, it’s about scaffolding the kids’ learning rather than going through a deficit model looking at what they get wrong and trying to fix that. That leads to teaching to the test.”
There are four categories of data are commonly used when analysing educational environments: demographics, student learning, school processes, and perceptions (Technology Alliance, 2005). Used in combinations, these measures can help you better understand the effectiveness of the learning environment and make informed decisions and changes.
The following collaborative process is being used in Queensland state schools to set aspirational targets, work towards continuous improvement of teaching and learning, build a culture of data inquiry, and improve teacher pedagogy.
Goals and targets are integral to setting strategic direction.
Begin the process by being explicit about the purpose of the collaborative inquiry. Examining school strategic documents guides the setting of goals and targets.
Note: Return to goals and targets after interrogating the data. This ensures decisions about actions related to the data are still aligned to goals and targets before proceeding to planning. If changes to targets and goals are required, they are made after interrogating the data and verifying the decisions to change the goals and targets.
Collect all the data sets related to the specific inquiry and the identified goals and targets.
Have a range and balance in data sets. It is important that patterns and trends identified in one data set are then matched to evidence across other data.
Depending on the goals and targets, data sets may include whole school information, specific learning area data, and data on student groups and student achievement
Interrogate to ask a series of questions that delve deeper and deeper into the data.
Draw an inference by concluding or judging from the evidence.
Use the information from data interrogation and synthesise it into a statement that describes what is happening.
Verify to confirm or substantiate inferences/hypotheses by comparing with other data information.
Consult other relevant data to ensure that the information and resulting inferences made from the data analysis correlate with other sources.
Plan by designing a scheme of action to meet the needs identified in the data analysis.
Implement by putting planned action into practice.
Maintain the evidence-based focus of the teaching and learning sequences. This will ensure that intervention responds directly to the teaching and learning needs uncovered in the data analysis.
Assess by measuring and evaluating the impact of the implemented action plan.
The focus of assessment is to gather information on the impact and results of the planned whole-school or targeted intervention on student achievement. Ensure that effective assessments and assessment tools are planned and implemented. These should be in place before, during and after the implementation of the planned action.
Reflect on the results achieved by the implemented action and the implications for ongoing improvement of teaching and learning.
“Computers are good at swift, accurate computation and at storing great masses of information. The brain, on the other hand, is not as efficient a number cruncher and its memory is often highly fallible; a basic inexactness is built into its design. The brain's strong point is its flexibility. It is unsurpassed at making shrewd guesses and at grasping the total meaning of information presented to it.”
Presenting data as visualisations or Information Graphics (infographics) can make complex information easier to understand as our ability to see patterns and trends in visual information is much greater than for textual or numeric data.
Gapminder is a data visualisation tool to dynamically display changes in data, usually over time. It provides a good example of how visualisation can be used to see patterns in data that may be difficult to detect with static tables and graphs.
Many Eyes is set of visualisation tools and examples that can display data in various dynamic and static ways.
There are many software tools that will aggregate data to automatically generate visualisations or assist in creating effective infographics.
Wolfram Alpha Personal Analytics for Facebook will produce a complex visualisation of the data contain in your Facebook database including cluster analysis.
Timeline and Timeline.JS are tools that will create an interactive timeline of events that you can place on your own website while tools such as dipity, tiki-toki, myHistro, and timetoast will let you create and host timeline visualisations online.
Google Maps and Google Earth are commonly used for Geovisualisation.
Wordle, Tagedo, Tagul, ABCya! and Worditout take text and by changing the size and/or the colour of words to indicate the number of times they occur in a text, produce Word Clouds (also called Tag Clouds or weighted lists) that can provide a quick means of visualising the important points in a text.
datavisualization.ch and informationisbeautiful.
There are a range of online tools for creating infographics such as Visual.ly, Easel.ly, Infogr.am, and guides to creating effective infographics.
More detail on LMS’s can be found in the Digital Pedagogies module: Learning Management Systems.
One advantage of using a LMS is that data on student interaction can be captured easily and if the LMS is used for many aspects of their learning - content, discussion, testing, etc. then all of this data is available from one collection.
Of course data is not just available on students, but on teachers - measuring usage of the system, access times and duration, discussions, etc. and these may all contribute to management analytics, but let us remain focused on learning, and the data we can draw from LMS to aid this.
www.snappvis.org and place a bookmark into a Safari or Internet Explorer browser, then while looking at your discussion threads in the LMS, click the bookmark and you will generate an analysis of the discussions. This will graphically show the conversational relationships between participants and statistical analysis of their postings to the discussion forum.
More detail on the use of games to enhance student learning can be found in the Educational Technologies module: Educational Gaming.
simSchool is a classroom simulation for teachers to analyse student differences, adapting instruction to individual learner needs, gathering data about the impacts of instruction, and seeing the results of their teaching.
You can explore a limited version, simSchoolLite at simschool.org/lite or you can register at www.simschool.org/register?type=demo and a workbook of activities from www.scribd.com/doc/3024555/simManual
Paid accounts can give you access to deeper levels of the simulation and are available at www.simschool.org/registration_select
simSchool creates a simulation in which students respond to your actions as a teacher, creating an experimental inquiry process in which you are aware of some influencing variables but unaware of others. These you need to infer from student responses to build an understanding of what works well with particular students.
From a Learning Analytics perspective, simSchool models what may occur in a classroom environment over many weeks as you use various data collection instruments to build an understanding of your students, what works for each of them individually, and how you can structure individual and whole class activities to best suit the learning needs of your unique combination of students.