How does online judge work ?

Have you ever heard of SPOJ, CodeForces, Hackerrank, UVa ? If you are in competitive programming line, you might have know these sites. They are called online judge. What is an online judge by the way ?

Wikipedia says:

An online judge is an online system to test programs in programming contests. They are also used to practice for such contests. Many of these systems organize their own contests.

The system can compile and execute code, and test them with pre-constructed data. Submitted code may be run with restrictions, including time limit, memory limit, security restriction and so on. The output of the code will be captured by the system, and compared with the standard output. The system will then return the result. When mistakes were found in a standard output, rejudgement using the same method must be made.

So online judge compiles and runs your code, shows some output or verdict. Is it that simple ? Nope, this whole process is quite complicated. Let me describe you the complications:


1. Time Limit Exceed: Your program needed to be exited in certain time. Can we enforce a time limit while executing a process on our OS by default ?

2. Memory Limit Exceed: Your program should not take all the memory you have of your CPU. How we can enforce that as well, any simple way ?

3. Stack Overflow/Runtime Error: After running your program, the system should understand whether the program ran successfully. How can we get that information ?


Okay, all of the above are just about the complication of system for checking a program. But there are some security issues as well. As one is allowed to submit any type of code on the online judge, it is quite obvious that some might submit some malicious code. How malicious ?


1. Some code can execute other process.

2. Some code can look into your file directory, and get some information.

3. Some code can run administrative command.

4. Some code can consume the server’s bandwidth.

And there are many facts, that a code can do.


And you obviously don’t want to let the program do that. So how can we enforce this permission rules as well ?

So how does a online judge handles above scenarios ?

Anyway, the process to limit permission and hardware for System’s safety is called Code Sand-Boxing.


So lets have a look to some Code Sand-boxing approaches-


A lame Way:

I have seen some online judge projects where the developer tried to find the malicious functions on a code before executing the code. For example, if your code is on C++, the judge will first check if there is any line of code where “system” word is used. This solution won’t work for various reason, as many programming language doesn’t work in the same way. You can generate a function dynamically and call it in some programming language, so filtering won’t work in those code. Another thing, what if you need to print “system” as output ? This solution will say that the code is malicious but it isn’t.


A long Historical Way:

In this approach, most the online judge works. And this solution is applicable to linux based system.

Stack Overflow/Runtime Error can be checked by the return value of a program. If it is 0, then you can say the program ran successfully, otherwise crashed.

Memory Limit limitation and time limit limitations can be provided using various Unix based libraries. Some programming language supports this kind of limitations flag by default, for example: Java.

Want to know about this one, check this article.

Directory permission is easy on linux, you just create a user and permit that user for some directories as well as some read/write permission. Then execute a program using that user, file directory problem can be solved. Also that code can’t run some administrative command.

Limiting network capabilities is also possible using some libraries, for example Trickle.

In this solution, one need to setup various libraries and write various codes just solve one security hole.


The modern way:

All of the above problem are solved if you use Docker or container based VM (Virtual Machine). You can create a container which will have a initial memory, with preset network permission, time limit for the container. All the hassles will be taken care of by the VM. Developer just can focus on the other things. In this solution, the whole judging system may seem to be slower than the other two methods but actually in large scale it is quite faster, and way more safer, cleaner than any other solution. You can check this GitHub Repo to know how docker can be used for judging.


This whole process of judging is one of most critical part of a online judge. To implement judging system, one need to understand how OS works, also how the programming language works. Hope you have got at least some knowledge about the whole process. If you want to look some already implemented solutions you check the links given below-


1. https://github.com/DMOJ/judge

2. https://github.com/mjnaderi/Sharif-Judge

3. http://cms-dev.github.io/

4. http://www.ucw.cz/moe/



Disclaimer: I haven’t built or related to any online judge. I worked and talked with some developer who made CodeMarshal, Toph and HackerRank. And I have interest in this topic so researched about it. :)


Thanks to Mahmud Ridwan vai for enlightening me about various aspects of a judging system.