AVTEMISS 7 submission

As part of the review of the AVETMISS 7 standard, the following was submitted to NCVER….

Feedback regarding the AVETMISS 7 standard

Introduction

ish is the leading provider of software for the short course adult education sector in Australia, providing the onCourse student management system. This software is free to all education providers and includes the ability to export to the plethora of AVETMISS standards.

As the product manager for this software package, I have been instrumental in developing and coding the AVETMISS export functionality of this software and working with our customers over the last 10 years, helping them to understand AVETMISS and lodge ‘correct’ data.

There are a number of fundamental problems in the AVTEMISS standard as it exists today; some are the result of a poorly planned standard which seems to have evolved over time rather than been designed from the ground up, others are due to the way the standard has been allowed by NCVER to be subverted for other purposes.

As I discuss these issues below, you will note that I refer to ‘standards’ in the plural. NCVER can either hide behind the illusion that AVETMISS is a single standard or recognise that it has been adopted, extended and strangled by each collection agency in Australia. These differences cause enormous difficulties, particularly for education organisations who report in multiple states or software vendors needing to support multiple states. Please do not simply ignore these issues with “that’s not our problem”. They are NCVER’s problem because they have not made sufficient effort to discipline those orgnisations misusing the standard (at the very least by requiring them not to call it AVETMISS), or (even better) by devising protocols for how the standard is to be extended when absolutely necessary.

If you get bored by databases and technical things, skip straight onto part 2.

1. Technical concepts

1.1 Fixed field data

When any software developer is first confronted by AVETMISS, the first thing they notice is the awkward fixed length file format. While not fatal to the implementation of the standard, it causes a wide range of problems.

a. NULL values. Sometimes fields can be blank. Sometimes filled with . Sometimes, like 'proficiency in spoken english' it is sometimes blank and sometimes '@@@’ but of course if you guess the wrong one depending on the value of ‘language other than english’ you get a validation error. Sometimes the same field can have both 0 and space as valid null values depending on which state you report to. b. Usability. Hard to read by humans. Hard to parse using standard software tools. c. Validation. A field length error anywhere results in validation messages which make no sense since all subsequent fields are affected. d. Field length. Awkard length problems are everywhere. For instance in NSW the Adult and Community Education collection artificially restricts certain fields to be two characters shorter than the AVETMISS standard. e. No built-in validation. XSD schemas would provide self-validation for a great range of basic data rules before any external validation tools were needed. f. No ability to extend the standard with auxilliary data models per state. This results in mutually exclusive and highly confusing ‘below the line’ fields.

Recommendation: change to an XML data format

1.2 Primary keys

The next major technical issue with the data format revolves around primary keys. The standard has a confused understanding the primary keys in each of the tables. Let’s take client ID as an example. When collecting data for one college it appears that the client ID is simply an identifier for a student, something that remains constant for a full data collection period (whatever that means). But then that data is collated at the state level and merged, either by pretending that it is possible to deduplicate student data with nothing more than a name and postcode (and sometimes with even less information) or by changing all the references to create a whole new set of unique keys.

The problems are exacerbated for a college which chooses to change software systems half way through a collection ‘period’.

Recommendation: client ID should not be unique on its own. Instead client ID and organisation ID should be a compound primary key for the Enrolment file. Similar solutions are easily applied to other keys in each file.

1.3 Relational data

There are quite a few places where the schema does not match the data it is trying to model. For example, the way of dealing with non-VET data (that is, without units of competency or modules) is awkward. The lack of an Enrolment table also introduces more complexity and complex validation (yes, I know there is a table called Enrolment, but actually it represents Outcomes).

1.4 Period

Next we have data collection data range. The standard is built around the idea that data will be collected annually and certain fields have this notion embedded within them. We’ve seen the issues with client ID above, but also fields such as the outcome status with its “continuing enrolment” are inherently date based. But reporting is required in all states (where funding is involved) on a very regular basis (sometimes fortnightly). So the data which is collected becomes very much a ‘timeline’ of the enrolment process. When did the student enrol, when did they complete each unit? As it stands, the standard has nowhere to specify the date range for which the report is relevant. The enrolment and outcome timeline is extremely relevant for funding bodies who use that data to pay education organisations for delivery.

Recommendation: review each field for its temporal relevance. Are you asking for the student’s suburb when they enrolled or at the end of the year? If it is a continuing enrolment, what does that mean? Continuing at what date? The end of the year? When the report was generated? Instead, concepts like start and end date of an enrolment make ‘continuing enrolment’ a pointless outcome to collect.

1.5 Inappropriate ABS reference data

The use of the ABS country code data is problematic, since that data set is intended to reflect the current state of the world political boundaries. So when a student reports that they were born in Yugoslavia and that is accepted by AVETMISS for several years, it is unexpected when this year the same student re-enrolling is now rejected by AVETMISS. Sure, Yugoslavia no longer exists, but that is where this person was born! Your validation against current country codes is asking training organisations be keep up with current world affairs, to make subjective decisions about how to interpret and modify statistical data and risk offending the student when they see their personal details now changed to a country they have no association with. A training organisation which enrols 20,000 students a year has not the time to phone students and discuss these details, nor figure out how to interpret political changes.

The NCVER approach shows a misunderstanding of the use of the ABS data set. It is not meant to represent country of birth, but rather the current geopolitical map of the world.

Recommendation: consult the people at the ABS about a better reference database to use for data which needs to represent country of birth, or accept all historic ABS country codes.

2. Validation

Where do I start? Fundamentally, the problem is that NCVER expects training organisations to fix data submitted by their customers. That is, a student writes on their enrolment form that they live in Paddington (which is a more expensive suburb) but put in the postcode for Darlinghurst because that is their mail centre. Their mail gets delivered fine so the college is happy. The student feels good about their house value and they are happy. Apart from NCVER. Submitting the name of the suburb to NCVER is actually just a test, like typing in your password twice on Facebook. Was the student paying attention when they filled out their form? Was the college diligent enough to look up every suburb as they data entry the student record and make sure the postcode matches?

Or that pesky student who is 95 years old. Their date of birth is rejected by NCVER (surely old people don’t do courses!), so NCVER happily carries on believing that a 95 year old student has never enrolled in anything ever. In reality our onCourse software now ‘cleans’ those dates of birth from the data we send, so you’ll never see that data from any of our customers.

Or a student who enrols in the same module twice in the same year. And passes both times. Yes, really it happens (some people like studying for reasons other than getting a certificate), but NCVER rejects the data since apparently it is also impossible. onCourse fudges the export to avoid the training organisation getting yet another error.

Why is it an error to specify a proficiency in spoken English when the student did not respond with an alternative language spoken at home? What if that’s what the student filled out on their enrolment form? Does NCVER want to know what the student reported or what the college ‘cleaned’ the data to show?

The problem with many of the validation rules is that it:

  • encourages training organisations to fill in @@ as soon as they hit some validation problem
  • rejects potentially useful information from students
  • requires training organisations to understand statistical collection techniques and ‘correct’ data in a way they guess is most appropriate. Should we move this student to Paddington or Darlinghurst? What second pretend enrolment should give that student who did pottery twice? These people are educators, not statisticians, and asking them to correct data in this way is a waste of their time and they are likely to do it in ways which creates bias in the results. If you believe that strict validation forces them to phone their students and get ‘correct’ information, then you misunderstand the way training organisations treat AVETMISS as a necessary chore with very little benefit to their organisation. NCVER may have more experience with organisations like TAFE and students who enrol in 3-12 month courses. But for educational organisations who sell a $49 RSA course which runs for three hours, they cannot afford to contact a student to check what language they speak at home. So their action when confronted with a validation error is to delete the offending data from the export or from their database.

Very often AVETMISS feels like a school test. Did we tick all the boxes? If this field is 3, then the other field must be 1201 and we can’t have a row in the 60 file. Did we pass the test? None of this produces more accurate data: it just forces us to write more rules in the code. Mostly NCVER gets worse data since it is easier for us to fill a field with @@ than try and reconcile incompatible data. You ask for answers that you already know (field of education, nominal hours, etc) as a way to checking to see if we are all paying attention.

Would you prefer to know that the student entered a particular suburb and postcode (and use that for your own statistical analysis) or would you like to see @@ in both places? Your choice.

I guess I should be somewhat thankful… the difficulty of complying with the standard you have devised is what keeps the smallest software vendors from competing with our product.

3. Submission

Documentation of the various AVTEMISS standards across Australia are reasonable in some jurisdictions and terrible in others. But since almost no one accepts ‘plain’ AVTEMISS the tools created by NCVER to perform validation testing are not nearly as useful as they should be.

NCVER should commit to creating validation code for version 7 which is released as open source with a permissive license (BSD or Apache license). It should be cross platform and not include any proprietary code (eg Microsoft Access). It should be capable of incorporating plugins or extensions for state specific requirements.

Because every collection agency writes their own validation tool, the differences between lodgements in different places are huge. The NSW government even requires Internet Explorer as the only browser which works with their tool. BACE in NSW can only validate over night, so fixing errors is very time consuming.

4. Centralised collection

Perhaps NCVER consider it outside their political remit, but centralised data collection would be the biggest single positive step NCVER could make. Although every state and agency has their own collection guidelines, a proper extensible XML data model would allow a single online system to collect data from all users directly. That data could be collected across any date range (since as noted above, the idea of a collection period needs to be adjusted). The appropriate agencies would have immediate access to that data as it is lodged. The benefits are huge:

  • single validation system
  • real time (or close to) data collection which NCVER could use to track current trends
  • central control of the AVETMISS standard and avoidance of mutually exclusive extensions
  • removal of duplicate efforts
  • single lodgement system with increased resources to give help to users, provide more helpful validation, etc
  • the education organisation can log in and set access control to allow certain agencies to their data as required
  • SOAP and other interfaces to make lodgement simplified or automatic from many enrolment software systems

5. Software accreditation

Education organisations would benefit from having a compiled list of software which meets the AVTEMISS reporting standards. Some testing process would be required and a registration issued. Having this accreditation does several things:

a. Creates a relationship between NCVER and software providers which does not exist at all now
b. Gives educational organisations a reference for determining which software meets the basic requirements
c. Allows software providers to have a process within which they can get concrete answers to how certain problems should be dealt with.

6. Statistics

I am not a statistician myself, so my views here are of a layman and should be taken separately to the issues I’ve raised above of the technical problems in AVETMISS. However it is my opinion that AVETMISS is not an ideal sampling method for some of the data collected. What is the use of Indigenous status data if under 10% of students respond to this question one way or the other? How can you be sure you are not observing statistical bias inherent in asking a question which some people might perceive will result in discrimination against them? Or the question about ‘mental illness’. You are collecting statistics about people who believe they have something called ‘mental illness’ (which you don’t define for the student, so some might think it means being committed to an instituion and others might think it means low range depresssion), who are prepared to put it on a form to the government/educator, who don’t skip all the AVETMISS questions entirely, and who don’t think it will result in discrimination or ridicule from their tutor. That’s a long distance from understanding how many people with mental illness undertake study.

Given that the direction of AVTEMISS is toward mandatory student ids, and matching data collected against other government databases, these personal questions are all the more intrusive and unlikely to be answered truthfully.

I am certain NCVER will respond: but we use other data collection techniques to adjust the data before we present results. That may be the case, but I’ve never seen error bars shown in any NCVER presentation which would reflect this scientific approach to the data collected.

And yet, there is no question about whether the student is in a wheelchair, requires assistance with transport to the teaching venue or uses a closed loop hearing system. These questions would help training organisations better support their students and governments better plan their resource allocation. The questions as they exist now don’t always appear to reflect the sorts of problems educators face in delivering training and the challenges students overcome to reach the point of enrolment. Is country of birth really so much more important to devising education strategies?

7. Privacy and national identifiers

In 1985 the Commonwealth government tried to get the Australia Card implemented. Two years later it was dead amid much public outcry. Since then, governments have tried to achieve the same results through other means: tax file numbers, medicare numbers and now student identifiers. NCVER is but a small cog in this wheel, but student identity is a core part of the problems with the existing standard. NCVER would do well to raise public awareness of whatever government policies are in place, and what their goals are. Is the collection of state and national student identifiers going to be a mandatory requirement? Will a student be able to receive training at all from an RTO without supplying these identifiers? If not, how is a voluntary collection policy going to be implemented at the college level?

Whatever your political views of this important issue (and I come down on the side of more privacy), NCVER are not communicating to the public or to educational organisations what privacy direction AVETMISS is taking. There was once a half hearted attempted at obfuscating the student name as part of the standard. Not only is this completely ineffective (since it is quite reversible with all the other identifying data in place), but the standard does not stipulate who performs this obfuscation of the data.

In a limited review, I’ve tried to outline some of the basic problems we’ve seen in dealing with AVETMISS for the last 10 years. Hopefully the next revision of the standard will look not just at tweaking the edges, but trying to address the underlying issues in the standard which causes it to vary from state to state, and which causes it to absorb such a vast amount of effort from a lot of people. Any change will cause short term disruption and more work for computer programmers, education organisations and collection agencies but with proper planning a much better long term outcome is possible.

Ari Maniatis
ish group

Latest News

Syndicate content