About the Competition

Inspired by the overwhelming success of the ICFHR2018 competition on Thai Student Signature and Name Components Recognition and Verification (TSNCRV 2018) we are proposing another novel competition on Short answer ASsessment and Thai student SIGnature and Name COMponents Recognition and Verification in conjunction with ICFHR 2020 (ICFHR2020 Competition on SASIGCOM). The proposed competition is aimed to automate the evaluation process short answer-based examination. Such automation can benefit education from, not only the significant time reduction when marking short answer examination papers, but also in being able to identify and verify that a student's signing signature on an exam paper is genuine and is their own.

Incidentally, an answer script from short answer type examination consists of three types of handwritten components, namely, student name and signature, and short (few-word) answers. Hence, the proposed competition contains three components which are short answer assessment - word spotting (recognising and marking the answers to short-answer questions derived from examination papers), student name components (first and last names) and signature verification and recognition.

Competition Prizes

Winner and participant certificates will be awarded to the winner and participants.

Important Dates

Competition Phases Date
Site Open Jan 12, 2020
Registration Open Feb 15, 2020
Datasets Release March 1, 2020
Registration and Submission Deadline July 10, 2020
Results Announcement July 11, 2020

      register button

Registration

The competition registration can be done by email. If you would like to register for the competition, please send an email to us at abhijit.das@griffithuni.edu.au and art.suwanwiwat@jcu.edu.au with a subject line: "ICFHR2020 Competition on SASIGCOM Registration", your email should contain the following information:

Name:
Affiliation:
Email ddress:
Phone number:

If you also want to obtain datasets please also complete the Licence and Agreement form and send it together with with your registration details.

Datasets and Submission

Datasets:

There are three (3) datasets to be chosen and employed for this competition, which are a short answer dataset, a Thai student name components dataset and a Thai student signature dataset. All datasets and their ground truth are already available. The detail of each dataset is described as follows:


1. The short answer dataset:

There are 104 exam papers in this dataset, 52 of which were written using cursive handwriting; the rest of 52 papers were written with printed handwriting. The exam papers contain 10 questions and he answers to the questions were designed to be a few words per question, which suits the purpose of the short answer question assessment system. The answers to the questions were straightforward for example “What does IT stand for?”, The correct answer can only be “Information Technology” although the writers may write the words using different cases, for example, “information technology”, “Information technology”. Table 1 displays some examples of handwritten short answers. The competitors will employ training dataset for their training then use the testing dataset to recognise the short answers and to mark the recognised answers. The aim is for the participants to be able to spot the correct answer words.

This dataset is for a word spotting task, where known words (corresponding to the correct answers) should be spotted then marked in the handwritten answers.

table1

Table 1: Short Answer Samples


2. The Thai student name components dataset:


There are 6,000 (100 students × 2 name components × 30 times) genuine name components in this dataset. For each of the genuine name components, 12 skilfully forged name components were produced. In total, there are 2,400 (100 students × 2 name components × 12 times) skilfully forged. Altogether there are 8,400 name components in this dataset. All samples were scanned at 300 dpi and are binaries. Example of genuine and skilfully forged Thai name component can be seen in Table 2.

The competitors will employ training dataset for their training then use the testing dataset to test their classifiers and verifiers scripts in order to obtain the highest recognition and verification rates.

Table 2: Examples of genuine and skilfully forged Thai name components

table2

3 Thai student signatures:

Both genuine and forged Thai student signatures were obtained from 100 volunteers. In total, there are 3,000 (100 signers × 30 times) genuine signatures obtained. For each of the genuine signatures, 12 skilfully forged and 12 random forged signatures were produced; therefore, there are 24 forged signatures per each genuine signer. In total, there are 1,200 (100 signer’s × 12 times) skilfully forged, and 1,200 (100 signers × 12 times) random forged signatures. Altogether, there are 5,400 signatures in this dataset. It was found that 36 volunteers signed their signatures in English script, whereas the other 64 signed their signatures in Thai. Among 100 signers, five (5) signers used both scripts. All samples were scanned at 300 dpi and are binaries. Student signatures, characteristics and their examples can be seen in Table 3.

Table 3: Signature Characteristic and Their Examples

table3

Datasets and Submission

Datasets:
The datasets with the ground truth are now available. Three different datasets, each with two flavours (6 datasets altogether) will be employed in this competition; the first dataset contains cursive and printed handwritten short answers (found in exam papers), Thai student signatures and the second dataset contains Thai student name components. All datasets contain 2 flavours, being:

1. The short answer datasets:

There are two flavours of handwritten short answers, being printed and curive handwritting. These datasets are for word spotting tasks.
1) The first flavour contains 52 cursive handwritten samples. There are 10 exam paper images in the training dataset, and 42 exam paper images in the testing dataset.
The second flavour contains 52 printed handwritten samples. There are 10 exam paper images in the training dataset, and 42 exam paper images in the testing dataset.

2. - 3. The Name Components and Signaure datasets:

The first flavour contains 30 signers/writers. Numbers of samples per each of its categories are:
1) Signature dataset: (genuine x 30, skilfully forged signatures x 12, and simple forged signatures x 12) x 30 signers.
2) Name component datasets:
   2.1) First and last names of each writer are in two separate files: ((genuine written name components x 30 and forged written name components x 12) x 2 name components per each writer) x 30 writers.
   2.2) First and last names are combined together: (genuine written name components x 30 and forged written name components x 12) x 30 writers.


The second flavour contains 100 signers/writers with 5 signatures/name components in each of its categories:
1) Signatures dataset: (genuine x 5, skilfully forged signatures x 5, and simple forged signatures x 5) x 100 signers
2) Name components datasets:
   2.1) First and last names of each writer are in two separate files: ((genuine written name components x 5 and forged written name components x 5) x 2 name components per each writer) x 100 writers.
   2.2) First and last names are combined together: (genuine written name components x 5 and forged written name components x 5) x 100 writers.


For all datasets, the competitors will employ training datasets for their training then use the testing datasets to test their classifiers and verifiers scripts in order to obtain the highest recognition and verification rates.

Numbers of samples for testing and training:
The first flavour:
Five (5) genuine samples of each signers/writers (sample no 1 - 5) are to be used for training:
For signature dataset: The rest of 25 genuine, 12 skilfully forged, and 12 simple forged samples of each signer are to be used for testing.
For Name component datasets:
   1) First and last names of each writer are in two separate files: the rest of (25 genuine and 12 forged samples) x 2 name components of each writer are to be used for testing.
   2) First and last names are combined together: the rest of (25 genuine and 12 forged samples) of each writer are to be used for testing.


The second flavour:
Three (3) genuine samples of each signers/writers (sample no 1 - 3) are to be used for training:
For signature dataset: The rest of 2 genuine samples (sample no 4 - 5), 5 skilfully forged, and 5 simple forged samples of each signer are to be used for testing.
For Name component datasets: The rest of (2 genuine samples (sample no 4 - 5) and 5 forged samples) x 2 name components of each writer are to be used for testing.

To obtain the datasets, first please complete the registration. After registered, you can download the dataset(s) below. By downloading the dataset(s), you agree to the licence agreement.

1. Answered short answer question exam papers - cursive handwritten exam - training and testing datasets and cursive handwritten exam - bounding-boxed training samples.
2. Answered short answer question exam papers - printed handwritten exam - training and testing datasets and printed handwritten exam - bounding-boxed training samples.
.
3. Student name component training and testing datasets:

4. Student signature training and testing datasets.

Challenges

Your challenges: There will be six tasks in this competition which are: (participants can participate in a single or multiple tasks) which are:

Word spotting for marking cursive and or printed handwritten short answers:
   1. Word spotting for marking the cursive handwritten answer words where the known words (answers should be spotted in the handwritten answers) should be spotted then marked.
   2. Word spotting for marking the printed handwritten answer words where the known words (answers should be spotted in the handwritten answers) should be spotted then marked.

For word spotting task, the participants are required to spot the correct answers from the scanned exam papers provided. There are one word and two-word answers. The two-word answers will be considered as one-word answers (blank space between words is included. The reason for doing this is to keep the task simple. Otherwise, NLP participants would have also been required to consider the semantic context; for instance, if the answer is "information technology" then technology information will be considered an incorrect answer.

Thai Student Signatures:
   2. Student identification (employing student signatures).
   3. Student verification process employing Thai student name components (student signatures).


Thai Student Name Components (first names, last names, and/or first and last names combined):
   4. Student identification (employing handwritten first and last names).
   5. Student verification process employing Thai student name components (handwritten first and last names).


Instructions for creating functions prototype:
Signature verification:
score = my_function(img1, img2, img3, img4, img5, img6) where img1 to img5 are training images, and img6 is the testing/query image.


Name component verification:
First name:
score = first_my_function(img1, img2, img3, img4, img5, img6) where img1 to img5 are training images, and img6 is the testing/query image.

Last name:
score = last_my_function(img1, img2, img3, img4, img5, img6) where img1 to img5 are training images, and img6 is the testing/query image.

First and last name combined at image level (by concatenating the images side by side). Note that the images (datasets) will be provided to you:
score = firstAndLast(image level fusion)_my_function(img1, img2, img3, img4, img5, img6) where img1 to img5 are training images and img6 is the testing/query image.

First and last name combined at score level (scores generated individually by first and last name fused together):
score = firstAndLast(score level fusion)_my_function(img1FN, img2FN, img3FN, img4FN, img5FN,img1LN, img2LN, img3LN, img4LN, img5LN, img6FN, img6LN) where img1FN to img5FN and img1LN to img5LN are training images for first and last names, respectively, and img6FN and img6LN are the testing/query images for first and last names, respectively.

Recognition tasks (for both signature and name component):
The participants need to submit two files:

1. Training File: A script that can read training images from a folder i.e. it takes samples in the training folder [path] as the input . The training folder contains subfolders; let's say 1, 2,..,100 (i.e. 100 subfolders for 100 classes (names/signatures)). Each subdirectory contains 5 training samples from each class. The training file should be able to read these images from these folders and to generate the training models, and so [model] = training [dataset_path].

2. Testing File: A script that takes inputs which are 1) a testing image from a testing folder [path] and 2) the model. And as an output, it can recognise the class level which the given image (sample) belongs to, together with its probability score (ex. confusion matric of SVM). A signature/name component of the testing file can be expressed as: [Level, Prob_scores]= testing [test_image_path, model]

Program files Submission

Program files submission:
Each participant has to complete the task/s of your choice, then provide (email) any program or executable files that can read images from a directory (one for training and other for test). to the organisers via email (abhijit.das@griffithuni.edu.au and art.suwanwiwat@jcu.edu.au) with a subject line: "ICFHR2020 Competition Submission".
We also appreciate if you would submit a full version paper focused on the method used in the competition for recognition and verification task to ICFHR 2020 main track.

Evaluation:

The evaluation will be performed using the program files sent by the participants via email to the organisers. Evaluation will be performed using performance accuracy and Equal Error Rate as the performance measures. An average of both the measure will be considered for ranking the submissions.

News and Results

12/1/20 - Registration is now open.
03/2/20 - Datasets are now available.

Organisers and Contact Information

Organisers

Abhijit Das (Indian Statistical Institute, Kolkata, India)
Hemmaphan Suwanwiwat (James Cook University, Cairns, Australia)
Umapada Pal (Indian Statistical Institute, Kolkata, India)
Michael Blumenstein (University of Technology Sydney, Australia)

Contact Information

For further information please contact:
Abhijit Das at abhijit.das@griffithuni.edu.au
Hemmaphan Suwanwiwat at art.suwanwiwat@jcu.edu.au