Language similarity metric

The following table shows the degree of similarity between any two languages in our dataset. The dataset has been created using the wiktionary website and can be downloaded here. It contains the correspondence of words in different languages. It contains a total of 10.556 words in 28 languages. The similarity between two languages is defined as the average standarized Levenshtein distance of all pairs of words of this two languages. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. We standarize the Levenshtein distance by dividing it by the maximum value possible, which is the maximum length between the two given strings.

Results

English Finnish German French Dutch Spanish Italian Swedish Czech Portuguese Polish Hungarian Danish Norwegian Slovene Croatian Romanian Icelandic Esperanto Catalan Latin Turkish Bosnian Estonian Slovak Interlingua Indonesian Lithuanian
English 100 23 35 42 39 37 39 39 24 37 27 21 40 42 22 32 35 24 36 34 36 25 32 30 24 46 25 26
Finnish 23 100 22 21 22 21 23 28 20 21 22 20 27 29 20 27 22 20 26 22 22 23 29 49 21 28 22 26
German 35 22 100 29 44 25 27 40 23 24 24 19 42 43 21 27 27 23 29 25 26 23 28 29 23 33 22 22
French 42 21 29 100 31 44 46 30 22 44 25 20 32 33 21 27 40 18 38 47 40 23 28 26 22 56 22 24
Dutch 39 22 44 31 100 27 28 41 23 26 25 20 43 44 23 29 28 26 32 27 27 24 31 31 24 34 24 25
Spanish 37 21 25 44 27 100 56 29 23 66 27 20 28 30 24 30 41 19 41 61 42 24 31 26 24 65 25 26
Italian 39 23 27 46 28 56 100 30 24 55 28 20 30 32 25 31 44 19 44 52 47 24 32 26 25 69 25 27
Swedish 39 28 40 30 41 29 30 100 26 28 27 22 62 68 24 33 30 39 33 29 27 25 35 34 27 38 26 27
Czech 24 20 23 22 23 23 24 26 100 23 40 22 27 28 46 48 25 18 28 24 22 23 51 25 70 30 19 28
Portuguese 37 21 24 44 26 66 55 28 23 100 28 20 27 29 23 31 41 18 40 57 42 23 31 25 25 65 24 26
Polish 27 22 24 25 25 27 28 27 40 28 100 21 26 28 38 43 29 18 30 27 24 24 45 27 44 33 23 30
Hungarian 21 20 19 20 20 20 20 22 22 20 21 100 23 26 22 25 22 18 23 20 18 23 30 26 26 28 19 22
Danish 40 27 42 32 43 28 30 62 27 27 26 23 100 77 24 31 29 39 33 28 28 25 33 32 27 37 26 27
Norwegian 42 29 43 33 44 30 32 68 28 29 28 26 77 100 27 32 31 41 35 31 29 26 35 35 28 38 28 28
Slovene 22 20 21 21 23 24 25 24 46 23 38 22 24 27 100 66 26 19 29 24 22 21 66 25 51 28 22 32
Croatian 32 27 27 27 29 30 31 33 48 31 43 25 31 32 66 100 32 21 34 26 24 27 91 29 50 35 25 35
Romanian 35 22 27 40 28 41 44 30 25 41 29 22 29 31 26 32 100 20 38 42 38 27 35 28 25 53 25 28
Icelandic 24 20 23 18 26 19 19 39 18 18 18 18 39 41 19 21 20 100 24 21 20 18 24 25 21 25 20 20
Esperanto 36 26 29 38 32 41 44 33 28 40 30 23 33 35 29 34 38 24 100 37 39 26 39 29 28 51 27 30
Catalan 34 22 25 47 27 61 52 29 24 57 27 20 28 31 24 26 42 21 37 100 42 22 30 25 25 60 25 26
Latin 36 22 26 40 27 42 47 27 22 42 24 18 28 29 22 24 38 20 39 42 100 19 27 25 23 58 23 28
Turkish 25 23 23 23 24 24 24 25 23 23 24 23 25 26 21 27 27 18 26 22 19 100 31 26 23 29 25 24
Bosnian 32 29 28 28 31 31 32 35 51 31 45 30 33 35 66 91 35 24 39 30 27 31 100 33 54 40 27 37
Estonian 30 49 29 26 31 26 26 34 25 25 27 26 32 35 25 29 28 25 29 25 25 26 33 100 26 33 26 28
Slovak 24 21 23 22 24 24 25 27 70 25 44 26 27 28 51 50 25 21 28 25 23 23 54 26 100 29 21 29
Interlingua 46 28 33 56 34 65 69 38 30 65 33 28 37 38 28 35 53 25 51 60 58 29 40 33 29 100 29 29
Indonesian 25 22 22 22 24 25 25 26 19 24 23 19 26 28 22 25 25 20 27 25 23 25 27 26 21 29 100 25
Lithuanian 26 26 22 24 25 26 27 27 28 26 30 22 27 28 32 35 28 20 30 26 28 24 37 28 29 29 25 100