Art of Cosine Similarity

Amartya Nambiar
2 min readMay 30, 2022

Photo by Pierre Bamin on Unsplash

Cosine Similarity is a mathematical concept that helps us derive the measure of similarity between entities.

Love is the power to see the similarity in the dissimilar. — Theodor W. Adorno

The Math

Cosine Similarity

Here, A,B are two vectors in a multi-dimensional space and the solution of the above equation will lie in the range of [-1,1] :

  • 1 : High Similarity
  • 0
  • -1 : No Similarity

In simple words, as the angle 𝜭 gets smaller, the cosine similarity nears 1.

That series you never planned binging, that song you never planned listening to…

Before using our knowledge of Cosine Similarity for computation, let’s use it to enrich our understanding about its different applications.

The Applications

  1. Document Similarity, Plagiarism Detectors.
  2. Used in a recommendation systems.
  3. In ML for data mining & selective information gathering.
  4. Used in Bioinformatics for finding similar sequences in DNA.
  5. Measure similarity of images in computer vision.

The Art

A good example is the best sermon. So I will walk you through a very simple demo. Our goal will be to calculate the cosine similarity of the two sentences below.

  1. ‘Sachin played cricket’
  2. ‘Sachin is the God of cricket’

In the above worked out example, measure of cosine similarity stands at 47%. This similarity is pulled out from the references of ‘Sachin’ and ‘cricket’ in the two sentences.

Now lets dive into how you could use python and its libraries to derive Cosine similarity.

Output : Cosine Similarity: 0.47140452079103173

We have got the same result 0.4714 as before.
One can also use functions within packages like sklearn to derive cosine similarity without any hassle(Escape the Math).

from sklearn.metrics.pairwise import cosine_similarity

Conclusion

In this article, you started with the math behind Cosine Similarity, its applications and went on to derive it using pen-paper mathematics and a computation using python. There are many other ways of implementing this metric in python and other languages.

One should also remember that the field of Machine Learning and AI purely relies on complex mathematics and a true ML enthusiast must never shy away from exploring it.

You may be interested in:

  1. The Data Science Process
  2. 5 Important PyTorch functions
  3. Connecting with me on LinkedIn

Responses (2)

Write a response

omggggg!!!!!! Data, Data Everywhere !!!!!!!!!

--

This is super informative! Thank you so much!

--