In recent years, a novel neural network architecture called *Transformer* first introduced in the ground-breaking paper *Attention is all you need *[arXiv:1706.03762] revolutionized the analysis of sequential data, with particular focus on Natural Language Processing tasks such as Machine Translation or generation of text starting from human prompt. A generalization of the Transformer architecture led a researcher to applications in various fields such as image generation (an image is a sequence of pixels after all).

However, the main issue with this marvellous kind of neural networks is that the appalling size of parameters (in the order of hundreds of *billions*…

In recent years, Toronto-based startup company Xanadu has introduced a python framework called PennyLane which allows users to create hybrid Quantum Machine Learning (QML) models. While it’s still too early to claim that quantum computing has taken over, there are some areas where it can give an *advantage* as for example in drug discovery or finance. One field that so far has been poorly explored in QML is Natural Language Processing (NLP), the sub-field of Artificial Intelligence that gives computers the ability to read, write and to some extent comprehend written text.

A graph is a data structure composed of a set of nodes connected by edges. Graphs are everywhere: they can represent a network of friendship, the connection between factories and stores, airports, and so on and so forth. Among the many operations that one can apply on graph to extract useful information (in itself a giant rabbit hole), probably the most obvious one is partitioning, *i.e.* the separation of *N* nodes into *K* groups based on some similarity or distance criteria. …

What if Unidentified Aerial Phenomena (UAPs) are not only real, but indeed less than ordinary objects, yet not necessarily artifacts such as spacecrafts, or requiring other otherworldly explanation? Can we make any sense of those sightings? As often the case in the news, a good starting point is to frame the conversation using what former intelligence officer Luis Elizondo calls the “five observables”:

- Anti-gravity lift
- Sudden and instantaneous acceleration
- Hypersonic velocities without signatures (e.g. vapour trails and sonic booms)
- Low observability, or cloaking
- Trans-medium travel (e.g. from air to water)

Although this subject is very poorly studies in Academic settings…

One of the most prominent tasks at which Machine Learning has been historically very good at is classifying items (*e.g.* images, documents, sounds) into different categories. Especially in recent years, advancements in the hardware capable of executing mathematical models such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs, LSTMs) made it possible to perform a quantum leap (sometimes literally) in performance. However, defining a model is only half of the story. …

One of the most common applications of Machine Learning is to classify entities into two distinct, non-overlapping categories. Over the years, several methods have been devised, ranging from very simple to more complex to almost a black-box. One common question that comes up when dealing with almost any kind of model is how to compare the performance of different methods, or different tuning of the parameters. Luckily, in the case of binary classifiers, there is a simple metric that catches the essence of the problem: it’s the Area Under the Curve (*i.e.* …

A matrix is a table of numbers. A number in a matrix is usually called an *element* and is indexed by two integers representing its horizontal (row) and vertical (column) position. Matrices whose elements are mostly 0’s are called *sparse*, otherwise *dense*. It turns out that most large matrices are sparse: *e.g.*, those appearing in language processing (NLP) or recommendation systems, because they represent information about the interaction between entities that are rarely in contact with each other. In most applications, matrices are not just used to *hold* information but are also *processed* to extract meaningful results. If most of…

Dynamic programming is a general method to solve optimization problems (“*find the maximum/minimum/shortest/longest…*”) by breaking them down into smaller ones (“*divide and conquer*”) and keeping track of their solutions (“*memoization*”) to make a more efficient use of information and resources. Memoization can be seen as a form or recursion. If you find it confusing, it’s probably because you don’t understand why it’s called “dynamic” in the first place. It turns out that the name is deliberately misleading for a historical reason.

Let’s say you want to calculate 6 *x 2*. …

A graph is a data structure composed of a set of objects (*nodes*) equipped with connections (*edges*) among them. Graphs can be **directed** if the connections are oriented from one node to another (*e.g.* Alice owes money to Bob), or **undirected** if the orientation is irrelevant and the connections just represent relationships (*e.g.* Alice and Bob are friends). A graph is said to be complete if all nodes are connected to each other. A directed graph with no loops is said to be *acyclic. *A *tree* is an undirected graph in which any two nodes are connected by *exactly one…*

Humans like looking for patterns, be it shapes in the clouds or relationships among numbers. We are probably evolved to do so for survival, and just can’t help. In science, pattern recognition helps researcher to tell the shape of galaxies or to identify the decay of short-lived fundamental particles such as top quarks. Especially in physics, scientists have been scratching their heads for years trying to come up with an explanation of the apparently random distribution of the mass of fundamental particles or at least those that belong to the theory of the Standard Model of Particle Physics. Is there…

NLP Machine Learning engineer at Ceridian. Previously smashing protons at the CERN LHC. Views are my own.