## Machine Learning

## Teaching

### Teaching performed by the BIIT group (Relevant to Bioinformatics and Algorithmic Data Analysis students.)

Institute of Computer Science Courses page.

### BSc, MSc thesis topics offered by BIIT members

## General Scientific Interest

InnoCentive - where the world innovates - www.innocentive.com

## K-means & correlation distance

Here's a useful observation related to the use of K-means together with the Pearson correlation distance (© Alex).

The standard K-means update step, where you update the cluster centers by taking the means of the corresponding points is technically not very appropriate in the case of the correlation distance *d(x, y) = 1 - corr(x, y)*. The proper step would be to take the *sum of the *normalized* points* as the new cluster center:

*c = sum _{i}(x_{i}/|x_{i}|)*

## Latent Process Decomposition

Latent Process Decomposition might be one interesting unsupervised analysis approach to try on the FunGenES data. The thing is something like clustering with a model which is slightly more sophisticated than the traditional "mixture". The authors kindly provide the code and some impressive examples of successful application of the method in their paper, so although the conceptual part of the algorithm is heavily mathematical, it might be possible to just try running it on the data with a reasonably small effort.

## ROC Area-Under-Curve Explained

Some things may take years to have them figured out. It is when someone shows you a definition of some "basic" mathematical object, but does not say why is this defined this way and how should it be interpreted. Moreover, you won't find the answer to your "why" and "how" questions so easily either because they are "so simple" that noone cares to tell, or simply because noone cares. Some time passes and you forget your desire to find out the meaning and just get used to the definition.

For example, it took me some months after I first heard the definition of matrix multiplication to understand why was it defined precisely like that. Same with the notion of a "determinant". Same with pretty much any other university's first-year mathematical object. The problem is probably in the fact that many of our math courses are "definition-based", not "intuition-based", but anyway, that's not the subject of this post.

## The True Value of P-value

Last year, in the end of October, we attended the school Analysis of patterns in Sicily. There were several interesting new things that I personally took home with me from this event, but the thing that probably provided most food for thought for me was the notion of "multiple hypothesis testing". I've been long willing to write something down on this topic, but it's only now that I've found the time and desire to do it. Note that I haven't yet taken the trouble of reading some more in-depth literature about this, so all that goes next is not necessarily flawless, but here we go.