My Coding Experience

Competitive Coding

I started coding in Feburary 2016, mainly to solve algorithmic problems appeared in various contests. My primary language is C++ (although I started picking up Python 2 recently for Datathon purposes). The collection of cool problems I have solved can be found here.

codeforces
Reached Master (ratings 2100) on Codeforces in August 2019!

Some of the awards I have won on the online or physical programming platforms are:

It's worth noting that my ability to code has also won and team and me the second place at the Waterloo Google Games on April 28, 2018, during which I nailed down 5 algorithmic- (and math-) puzzles for my team, resulting in us solving 11 puzzles in total!

Hackathon Experience

  1. LifeLog, the first UOttaHacks

    Date: February 17-18. 2018

    On GitHub, developers can showcase their progress by commiting changes of their own files. Yet sometimes they prefer spending time to enrich themselves through reading articles related to their ongoing side projects, or merely for interest. Hence, it might be misleading for people to think that they do not contribute much, and are therefore non-productive during the period.

    We developed a web application so that developers (or any users) can 'commit' their readings onto LifeLog. To be precise:

    • People can commit an article they find interesting onto LifeLog.
    • People can also highlight phrases they find interesting onto LifeLog.
    My contribution to this application is the Chrome extension framework. To be precise, I started off by writing the manifest.js file (that is needed for most Chrome extension), before developing the main.js file by adding the chrome.addListener function: this allows the Chrome extension to send the commit to console log whenever a new article has been 'commited'.

  2. Trading Simulator, Royal Bank of Canada(RBC)'s The Next Great Innovator Hackathon

    Date: September 23-24, 2017

    My team agreed on building a Trading Simulator that opens to everyone to compete against RBC's trading strategy for the following two objectives:

    • To attract top performers into working in RBC and improve RBC's trading strategies.
    • To improve the trading stragies of RBC through 'learning' from the top performers.
    Our team comprised a Business Analyst, a UI Designer, and three developers (me included).

    In simulating the way in which an individual attempts to beat RBC by guessing the stock price at the next minute, a teammates posed me this question: how do you predict the stock price at the next minute given the prices in the past? After some comptemplation I noticed the similarity of this question to the classical problem of finding the next number in a sequence, which immediately prompted me to use the Lagrange Interpolation Polynomial to fit every point into a single polynomial. The problems associated to this formula are:

    • Inefficient running time due to the need of recalculation of the data for every single computation.
    • Integer overflowing (even in Python) due to the multiplication of large number (say, hundreds or thousands) of variables.
    • Huge deviation of the next predicted number from the numbers in the original sequence (due to the high power of the leading coefficient) that renders the prediction inaccurate.
    I later learned that this problem is named overfitting.

    In view of this I attempted the linear regression method: the results were getting better, but not quite there. Nonetheless, this motivated the approach of fitting a polynomial with degree depending on the number of points we had in the past. This was achieved by fitting the numbers into matrices in computing the coefficients of the resultant polynomial using numpy.linalg package. It's then found that an optimal way (using this approach and given the dataset) was to set the degree of polynomial at $O(\sqrt[3]{n})$ where $n$ is the number of points. This effectively capped my error at 4% in price prediction (around 1.5% on average).

    Code Snippet
    A snippet of my Python code written for the regression calculation by storing the values of $\sum x^k$ and $\sum yx^m$.

    Some other tasks I did during the Hackathon was to help debugging codes written by my teammates in Javascript, HTML and CSS (towards the deadline of the submission).

    Languages: Python (numpy.linalg library), HTML, CSS, Javascript.

    Skills gained: a preliminary idea on how machine learning works.

  3. Idea Connector, Waterloo Equithon

    May 5-7, 2017

    This was my first Hackathon experience. The problem that my teammates and I worked on was the fact that there were people with ideas and insights but do not have the skills to implement them. On the other hand, there were also people with advanced skills in their related fields yet do not have the insights to see which problems to solve. Hence we developed a website to connected the two types of people.

    Languages: HTML, CSS, Javascript.

    Benefit gained: a motivation to develop my own website here.

Datathon Experience: Waterloo Datathon 2017

On May 13, 2017, I was selected to take part in the Waterloo Datathon (through an assessment). The task given on the day was to draw conclusion based on the data on pickups by Uber and other transportation method in different areas. My teammates and I decided to relate this data to the crime rate happening in the respective cities (using another dataset we searched on our own). Using statisical languages like R and Python, we drew a conclusion that public transportation is less perferred in areas with high crime rates.

Languages: Python (panda package), R.

Projects

  1. The C++ Extended Integer System (Ongoing)
  2. Inspired by the cyrptography research that I am undertaking in Summer 2018, I couldn't help but wonder: how to implement a system of numbers modulo n that provides a convenient method for addition, subtraction, multiplication and even integer division (inverse)? More generally, how can I implement features in C++ that offers cool mathematical things like what SageMath.org offers? Below are the framework I needed to do:

    1. The BigInt system in C++.
    2. The int type in C++ allows 32-bits of integer storage while the long long int type allows 64-bits. In order for a sagemath simulating system to be efficient, we must have a system that stores way beyond the prescribed 64 bits for integers. The features currently supported are as of below:

      • Basic arithmetic (addition, subtraction, multiplication, integer division and remainder).
      • Comparison (greater or less than, equality).
      • Bitwise shift (left shift << and right shift >>).

    3. Number modulo n.
    4. The arithmetic operations are the same as the BigInt system, but there are a few differences:

      • Ordering and comparison no longer makes sense here.
      • Division of integers can still return an integer without remainder provided that the divisor is relatively prime to n (for example, if n = 11 then 1/5=6).
      • Fast exponentiation can be done, i.e. the result of $k^m$ mod n can be found in polynomial time in log k, m and n.

    The main challenge of this project is to optimize my library to the maximum possible while keeping the object-oriented development principle (single-responsibility principle, minimize coupling between modules, etc). In the very beginning, it takes 5 seconds to read, output, and perform all arithmetic operations on two 300-digit integers, and modular exponentiation seemed infeasible. Below is my journey to optimization in bringing 5 seconds to 0.016 seconds (300 times!) and modular arithmetic of 60-digit numbers from 232 seconds to 0.175 seconds (1000+ times!):

    1. By avoiding unnecessary recursion and recalculation in reading input (conversion from base 10 to base 2), which allowed my algorithm to go down to 2 seconds from 5.
    2. By decapsulating my objects when doing arithmetic operations (i.e. perform arithmetic on vectors instead of doing them repeatedly on BigInt to avoid unncessary constructor and destructor running) I successfully brought it down from 2 seconds to 0.37 seconds.
    3. Previously, modular exponentiation is still very slow: doing it for two 10-digit numbers takes more than 15 seconds. I brought this to around 2 seconds by simply decapsulating division and remainder process.
    4. Compiling with the O2 flag (optimization) made all process above finish executing in one-third of original time.
    5. Previously, my Karatsuba's algorithm involves creating new smaller vectors and perform recursion from there. By avoiding copying and performing recursion on a segment of the original vector instead, I gained 1/3 speedup overall.
    6. Finally, changing from Karatsuba's algorithm to Fast Fourier transform allows a whooping 5 to 6 times optimization on modular arithmetic of two 300-digit numbers modulo another 300-digit number. It also gives almost 3 times speedup on ordinary arithmetics.

Research Experience

As mentioned in my home page, I just finished doing a research on finding weakly connected components in a graph. The task of finding a weakly connected component can be rephrased as below: Given a group of people, some pairs of them are friends (and friendship is mutual). For each pair of people A and B, do there exist a set of people from the group, such that we can line them up with A on one end and B on the other end, and any two neighbouring people on the line are friends? Although various algorithms exist (breadth-first search, depth-first search, quick-find, quick-union), most (if not all) of them cannot update the set of connected components efficiently in the case of edge deletion.

A crucial observation in solving this mega-problem is that, in general, the probability of a connected component being split into two upon edge deletion is rather small. Having this in mind, and having familiarized myself with the existing algorithms in a static setting, I started off by modifying the features of the existing algorithms:

As compared to the naive algorithm of repeating the same algorithm (used for static setting), these algorithms managed to reduce the running time by 5000 times.

The final part of my research concerns about making these algorithms work in parallel processing (that is, multitasking). After designing the algorithm on paper, I explored two technologies: OpenMP and Message Passing Interface (MPI). Under the advice of my supervisor, I chose to move on with MPI due to the private memory assigned to each processor (which makes it easier to scale to big numbers). In the end of my research, I constructed a framework of the algorithms aforementioned that conatains the idea behind it.

Dataset used: 5 graphs with up to 105 million vertices and up to 2.6 billion edges.

Language: C++.

Proof of correctness: by comparing output with that produced by Stanford Network Analysis Project (SNAP).

Skills gained: Linux, debugging, parallel processing.