I started coding in Feburary 2016, mainly to solve algorithmic problems appeared in various contests. My primary language is C++ (although I started picking up Python 2 recently for Datathon purposes). The collection of cool problems I have solved can be found here.
Some of the awards I have won on the online or physical programming platforms are:
On GitHub, developers can showcase their progress by commiting changes of their own files. Yet sometimes they prefer spending time to enrich themselves through reading articles related to their ongoing side projects, or merely for interest. Hence, it might be misleading for people to think that they do not contribute much, and are therefore non-productive during the period.
We developed a web application so that developers (or any users) can 'commit' their readings onto LifeLog. To be precise:
My team agreed on building a Trading Simulator that opens to everyone to compete against RBC's trading strategy for the following two objectives:
In simulating the way in which an individual attempts to beat RBC by guessing the stock price at the next minute, a teammates posed me this question:
how do you predict the stock price at the next minute given the prices in the past?
After some comptemplation I noticed the similarity of this question to the classical problem of finding the next number in a sequence,
which immediately prompted me to use the Lagrange Interpolation Polynomial to fit every point into a single polynomial.
The problems associated to this formula are:
In view of this I attempted the linear regression method: the results were getting better, but not quite there. Nonetheless, this motivated the approach of fitting a polynomial with degree depending on the number of points we had in the past. This was achieved by fitting the numbers into matrices in computing the coefficients of the resultant polynomial using numpy.linalg package. It's then found that an optimal way (using this approach and given the dataset) was to set the degree of polynomial at $O(\sqrt[3]{n})$ where $n$ is the number of points. This effectively capped my error at 4% in price prediction (around 1.5% on average).
Some other tasks I did during the Hackathon was to help debugging codes written by my teammates in Javascript, HTML and CSS (towards the deadline of the submission).
This was my first Hackathon experience. The problem that my teammates and I worked on was the fact that there were people with ideas and insights but do not have the skills to implement them. On the other hand, there were also people with advanced skills in their related fields yet do not have the insights to see which problems to solve. Hence we developed a website to connected the two types of people.
On May 13, 2017, I was selected to take part in the Waterloo Datathon (through an assessment). The task given on the day was to draw conclusion based on the data on pickups by Uber and other transportation method in different areas. My teammates and I decided to relate this data to the crime rate happening in the respective cities (using another dataset we searched on our own). Using statisical languages like R and Python, we drew a conclusion that public transportation is less perferred in areas with high crime rates.
Inspired by the cyrptography research that I am undertaking in Summer 2018, I couldn't help but wonder: how to implement a system of numbers modulo n that provides a convenient method for addition, subtraction, multiplication and even integer division (inverse)? More generally, how can I implement features in C++ that offers cool mathematical things like what SageMath.org offers? Below are the framework I needed to do:
The int type in C++ allows 32-bits of integer storage while the long long int type allows 64-bits. In order for a sagemath simulating system to be efficient, we must have a system that stores way beyond the prescribed 64 bits for integers. The features currently supported are as of below:
The arithmetic operations are the same as the BigInt system, but there are a few differences:
The main challenge of this project is to optimize my library to the maximum possible while keeping the object-oriented development principle (single-responsibility principle, minimize coupling between modules, etc). In the very beginning, it takes 5 seconds to read, output, and perform all arithmetic operations on two 300-digit integers, and modular exponentiation seemed infeasible. Below is my journey to optimization in bringing 5 seconds to 0.016 seconds (300 times!) and modular arithmetic of 60-digit numbers from 232 seconds to 0.175 seconds (1000+ times!):
As mentioned in my home page, I just finished doing a research on finding weakly connected components in a graph.
The task of finding a weakly connected component can be rephrased as below:
Given a group of people, some pairs of them are friends (and friendship is mutual).
For each pair of people A and B, do there exist a set of people from the group,
such that we can line them up with A on one end and B on the other end, and any two neighbouring people on the line are friends?
Although various algorithms exist (breadth-first search, depth-first search, quick-find, quick-union),
most (if not all) of them cannot update the set of connected components efficiently in the case of edge deletion.
A crucial observation in solving this mega-problem is that, in general, the probability of a connected component being split into two upon edge deletion is rather small. Having this in mind, and having familiarized myself with the existing algorithms in a static setting, I started off by modifying the features of the existing algorithms:
The final part of my research concerns about making these algorithms work in parallel processing (that is, multitasking). After designing the algorithm on paper, I explored two technologies: OpenMP and Message Passing Interface (MPI). Under the advice of my supervisor, I chose to move on with MPI due to the private memory assigned to each processor (which makes it easier to scale to big numbers). In the end of my research, I constructed a framework of the algorithms aforementioned that conatains the idea behind it.