Welcome to collectivesolver - Programming & Software Q&A with code examples. A website with trusted programming answers. All programs are tested and work.

Contact: aviboots(AT)netvision.net.il

Buy a domain name - Register cheap domain names from $0.99 - Namecheap

Scalable Hosting That Grows With You

Secure & Reliable Web Hosting, Free Domain, Free SSL, 1-Click WordPress Install, Expert 24/7 Support

Semrush - keyword research tool

Boost your online presence with premium web hosting and servers

Disclosure: My content contains affiliate links.

39,945 questions

51,887 answers

573 users

How to calculate the percentage similarity between two strings in Python

3 Answers

0 votes
# difflib - Quick, built‑in similarity - Character‑based

from difflib import SequenceMatcher

def similarity(a, b):
    return SequenceMatcher(None, a, b).ratio() * 100
    
s1 = "The cat sat on the sofa"
s2 = "The dog sat on the carpet"

print(similarity(s1, s2))



'''
run:

70.83333333333334

'''

 



answered 3 hours ago by avibootz
edited 3 hours ago by avibootz
0 votes
# Cosine TF‑IDF - Meaning/semantic similarity - Best for sentences

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

def similarity_percent(s1, s2):
    vectorizer = TfidfVectorizer()
    tfidf = vectorizer.fit_transform([s1, s2])
    sim = cosine_similarity(tfidf[0:1], tfidf[1:2])[0][0]
    return sim * 100

print(similarity_percent("The cat sat on the sofa",
                         "The dog sat on the carpet"))



'''
run:

60.297481603805714

'''

 



answered 3 hours ago by avibootz
0 votes
# Jaccard - Keyword overlap - Ignore word order

def jaccard_similarity(a, b):
    set1 = set(a.split())
    set2 = set(b.split())
    return len(set1 & set2) / len(set1 | set2) * 100


print(jaccard_similarity("The cat sat on the sofa",
                         "The dog sat on the carpet"))




'''
run:

50.0

'''

 



answered 3 hours ago by avibootz
...