This is document we’ll cover some of the ways ish works to ensure the security of your data. We’ll look at security in the broadest sense, not just “how can we stop people stealing your things” but in the broader sense of the things we do to create products and services which are resilient to data loss. Resilience against ransom attacks can include good quality backups and resilience against denial of service (DoS) attacks are about the flexibility of the infrastructure to change quickly in response to an attack.
onCourse is an enterprise system designed to scale and handle millions of enrolments, and thousands of simultaneous enrolments. Because of our architectural design, the applications can grow as needed with appropriate hardware. Our existing customers range from small colleges with 5 or few full time staff, who require systems that need no IT knowledge to maintain; right through to government departments, large private colleges and universities.
General security principles
All good security is in-depth, by layering different types of security over each other. We follow best practices for the type of confidential student and customer data we hold and take security extremely seriously.
Regular system audits are performed on all our security processes and we are constantly looking at ways to reduce the opportunities for intrusion.
Developers are given very limited views into production data and only for the purpose of diagnosing or repairing a bug report. Technical staff have access controlled by three factors of authentication: password, RSA keys and TOTP.
Other than for testing purposes there are no Windows installations anywhere in our offices and none at all within the data centre. All servers are FreeBSD, kept up to date with the latest security patches as needed. This reduces some vectors for compromise.
Physical access to the data centre is limited to photo id and fingerprint access of ish Sydney technical staff.
Our automated monitoring system detects packages which are out of date and have CVE advisories issues against them so that we can review and patch accordingly.
Components and services
A key part of our architecture is the separation between the onCourse administration system designed for your office team and the website components. The administration components can be run on your own network, in a separate database, with internet outages or other downtime. These outages do not impact the public facing web assets at all, so students can continue to purchase and browse without interruption. This is an absolutely critical feature to your ability to market and sell 24 hours a day.
onCourse is developed with a modern Service Oriented Architecture as a set of microservices. This leads to some key advantages:
- simple deployment for each individual service
- different development speeds for different services
- no one point of total failure
- security between the service: a breach of one service could be isolated to that one service
- each service is given only as much access to resources as it needs. This includes database access, S3 documents and system resources. For example the website has only read access to the document store.
onCourse consists of the following components.
onCourse Server is a standalone application with an integrated Jetty service for serving connections to onCourse Client. It can be clustered with a load balancer in front of the application to distribute requests.
onCourse Server requires only Java 11 as a system requirement, plus database storage of mysql, aurora or mariadb. Other databases such postgresql and MS-SQL should work but are not regularly tested. onCourse uses no database specific features like stored procedures or views.
onCourse Server makes heavy use of the Apache Cayenne ORM layer to provides data integrity and validation services. Jetty provides authentication and authorisation services for each onCourse Client request.
onCourse Server requires only read-only disk access other than a single directory for storing logs. Outbound connections are limited to port 443 (for secure integration with other web services over SOAP or json/xml REST), a connection to the database cluster, DNS services, and outbound TLS SMTP typically on port 465.
For API calls to onCourse from third party systems, an API token system is available. Tokens should be created one per remote service.
For user logins, the following features are available to enhance security:
- Two factor authentication (2FA) using a TOTP six digit code changing every 30 seconds.
- The ability to enforce 2FA for all users
- Disabling the login after a certain number of attempts (5 by default)
- Requiring a password change every certain number of days
- Password quality enforcement
- Automatically disabling accounts inactive for a certain period
Password resets are not available from the login screen to reduce phishing channels, however another admin user can trigger a reset email for a user who has lost their password.
The onCourse Web application drives all our public facing websites with a high performance Apache Tapestry templating engine. Almost all the html across the entire site can be customised, as well as all the js and css.
onCourse Web supports multiple websites per college, so you can separate parts of your business to publish different sites on different URLs.
The entire site is served over HTTP 1.1 or 2 with TLS 1.2 and 1.3 supported, including the parts before any checkout. HSTS enforces TLS connections and helps reduce the scope of a MitM attack if your DNS is compromised.
The Web application is completely stateless on the server-side and any client state such as shopping basket contents is kept in cookies. This allows us to scale the web front end easily and handle very high peak loads. We also aggressively cache resources to ensure best performance to end users.
Our Document Management system is used to host large assets such as video and images. AWS S3 gives users excellent speeds for these resources.
The onCourse CMS application provides a secure means for you to update your website. You can use the web browser front end to edit pages, menus, blocks, templates and other content and settings. You can also access the content through a webDAV interface which makes it easy to automate updates and use your own preferred tools for editing pages.
Changes through the CMS are not reflected on your public website until you “publish” the changes. The publish action makes a new versioned copy of your content and pushes them through to the live onCourse Web application cluster. Static files such as js and css are saved to a git repository and shared with your users through github. This combination of tools gives you history and the ability to audit and review changes. You can also rollback changes.
Images (png and jpeg) are automatically processed through lossless optimisation tools to ensure the best speeds for end users.
A services application is responsible for sending SMS, performing background database checks and updating the web data stores when content changes in the onCourse Server.
A specialised USI service is responsible for integrating with the Australian USI agency. This application has very limited scope and has no direct access to any database. Shared security tokens between this service
The onCourse student and tutor Portal is a non-branded environment optimised for mobile and tablet access. It is a shared platform across our customers and accessible only via login.
Our onCourse Certificate validation service provides a gateway for employers to verify certificates issued by onCourse. It is sessionless and stateless, offering only the verification of a special code issued against each certificate.
Our Search Application engine is based on Apache Solr/Lucene and indexes courses, suburbs (for typeahead) and tags. The Web application makes requests to Search through an internal load balancer to ensure that all search instances are equally loaded.
The index is refreshed in real time as data is updated, and then again at midnight to ensure the index is fresh and purged of stale data.
Solr is also configured with its faceted extensions to drive your website. This is extremely fast and can deliver webpages with dozens of precounts in place and no noticeable delays.
The onCourse Checkout application drives the student application, waiting list, mailing list and enrolment processes. It picks up all the styling and templates from the web application and so feels like a completely integrated part of the site with no transition from one part to another.
Server-side sessions are used to track user progress through the collection of data and collection of credit card payment information. The application makes real time connections to our upstream credit card gateway who then link to each merchant bank. Typically, we get responses from the bank in 2-3 seconds.
The onCourse Enrol application feeds specific ecommerce data into Google Analytics so that your website tracking is populated with dollars and specific purchasing outcomes.
Our document management store is backed by Amazon S3 using a separate versioned bucket per customer. Because the bucket is versioned, no data can be deleted from the onCourse software itself and deletion rights are only available to senior ish technical staff using two factor authentication, including a TOTP key.
We use the AWS Sydney data centre only to ensure that no customer data ever leaves Australia.
Infrastructure and network
onCourse Server can be hosted locally within your own network infrastructure or within the ish provided cloud environment.
If hosted within your infrastructure ish can still provide a managed ‘onsite cloud’ type service, providing both a hardware application and remote management along with offsite backups.
If hosted on the ish cloud environment, our hardware and management is provided 100% by the ish team on hardware we own (other than where we’ve noted use of AWS services).
Two independent ethernet cables provide redundancy into the Equinix IBX internet service. We maintain capacity in those links with an average utilisation of 10-20% of the burstable capacity.
Equinix data centre security
Haproxy provides load balancing for the application server cluster and also offloads SSL/TLS from the application. CPU utilisation even after we moved to https for every page in all our sites remains under 5% on average.
Our load balancers support IPv6 and all our sites are configured to be ready for IPv6 traffic however none of our customers have yet taken advantage of this service.
Haproxy also provides a security layer since all requests are proxied rather than port forwarded. This helps to clean up incoming requests from maliciously crafted packets or DoS before they hit the application server.
A pair of failover firewalls running FreeBSD and pf firewall rules. State is constantly synced between the firewalls, so failover using CARP is instantaneous without the loss of TCP state.
All applications run within an Apache Tomcat environment except for the Search application which is standalone. Apache Zookeeper is used for shared configuration management and process locking.
Sessions are serialised and shared across the cluster for immediate failover and load balancing.
Our database is a Galera master-master cluster running on mariaDB 10.2 with three nodes.
The database disk store is running on ZFS RAIDZ2 which can survive the loss of any two disks without loss of data or significant drop in performance. ZFS also checksums every block on disk for increased reliability. With 96Gb of RAM assigned to the database on the two primary nodes, all indexes remain in memory for excellent performance.
Intrusion pen testing
We run monthly penetration testing to reveal weaknesses in our technology, software, firewalls and other infrastructure.
The key data stores are the database which backs the web applications (Web, Enrol, CMS, etc) and the database which backs the onCourse Server application. For customers where we host those services on ish infrastructure we take multiple independent backups as follows:
- Percona xtrabackup for live (hot) backups of the data nightly with 7 days of history.
- ZFS snapshots of the data store every hour. We keep hourly snapshots for a week, daily snapshots for two months and then monthly snapshots forever.
- Daily sync over an IPSec tunnel to copy those snapshots to storage offsite.
By using separate and independent backup tools and mechanisms we provide redundancy in case of failure of one of those tools.
Monitoring and consistency
We use a combination of Zabbix and JMX to monitor all systems continuously. Over almost 10,000 separate measurements are taken by our monitoring server.
Incident alerts are sent by email/SMS/IM to our technical team according to severity. Disks and other components are replaced before they fail as we are alerted to pending issues. Servers can be taken out of a cluster and upgraded or replaced as needed.
With SaltStack as our configuration management tool, we have extremely consistent and reliable operating system and application configuration. Each change can be reviewed first on our test servers which reproduce the production environment.
We utilise a centralised clustered log shipping and storage mechanism to ensure that even if a server is compromised or unrecoverable, logs will always be available for forensic analysis.
Our systems are clustered so that the loss of any piece of hardware will have no impact on student or tutor facing system such as website or enrolments. However in the case of catastrophic loss of the entire Equinix Sydney data centre, we have AWS instances pre-configured and ready to start. Because we have all our systems configuration in saltstack, those AWS instances can be started and deployed with current configuration in under 5 minutes.
Migrating DNS records will take a little longer as some DNS caching will mean that services might restore over a period of up to an hour.
In a complete loss of the primary data centre, worse case is a rollback to offsite database backups from the previous night. We will review the possibility of running one of the Galera cluster databases in async mode as a remote offsite node to mitigate this issue.
PCI DSS Compliance
onCourse never stores complete credit card numbers or CVV security values, and that data never traverses our software or infrastructure. However because we integrate with gateway providers, we must comply with the current PCI DSS standards applicable for our type of service provider. This includes monthly penetration testing, reviews and evaluations of our security processes.
Even when we make it possible for you to refund previously used cards, or process payment plans automatically, we only store special secure tokens that have no value if they were copied or stolen.
Each tenant has a separate onCourse instance running either on a separate server or an isolated BSD jail. Document storage with unique credentials and separate S3 buckets enforce separation. Where some other systems are shared (such as website hosting, web search indexing or a database cluster) numerous controls are in place to ensure data is isolated to the client.
Controlling access to customer data
Although no-one can ever truly claim to (or want to) follow a 100% agile process, we adopt very many of its philosophies.
onCourse is developed in two week milestones with phases as follows:
- Two weeks of product feature design and technical review. In this phase we design all UI, specify schema changes and make technical decisions on the implementation plan. The lead developer reviews and estimate tasks.
- The development team build code in the next two week phase.
- QA spend another two weeks on testing. Sometimes developers are called back to make adjustments.
- Our account management and support team then spend up to two weeks in the final phase writing documentation and deploying to our staging servers before final release.
Although each new feature takes roughly 8 weeks from idea to deployment, these phases are all run in parallel resulting in a new release each fortnight. These smaller incremental changes are easier to test.
We don’t deploy every release to every customer. Depending on their needs we can configure our Saltstack configuration system to deploy updates according to different schedules. By default, we deploy every second release (roughly monthly) since this gives a good balance between deploying bug fixes and new features with not interrupting business workflow.
We have a suite of unit and integration tests that are automatically run on multiple operating system platforms and multiple database backends on our Jenkins build server.
After automated testing, our QA team run through manual tests both in an ad hoc against the features being worked on, and a more systemic review using test plans maintained in TestLink.
Our task tracking system is a key component in delivering timely feedback to customers and organising our development milestones.
We approach security as a team effort, from our development team to our support and admin staff, we are constantly learning new processes according to best practices. we regularly review our processes to ensure we are up to date and compliant.
We encourage our customers to practice good security hygiene. We provide tips to cutomers via regualar forum posts and emails. We’ve purposefully built our software to prompt users to implement best practices such as Two factor Authentication and password protection etc.