Finding a record in a database with Python
Finding a record within a database can be automated. You can use such a script to automate many different routine tasks. It can for example be used to get information out of the database with a voice-assistant or do multiple searches in parallel for completing financial administrative tasks.
In this tutorial we are gonna keep it simple and make a basic script that takes as an input a name and birthdate and as an output gives you a record. We are gonna start with choosing from a list through searching on the basis of the birthdate. After confirming the name is correctly identified we are going to choose the person from the list. Here is the workflow:
The reason we search for the birthdate is that the birthdate is often a more reliable way of communicating information about a person. The spoken name can often be spelt in many different ways, while a birthdate can not. Of course you have the issue of twins, but that’s not something we are going to look into right now. The first step we are going to focus on here is applying a for loop to finding a birthdate in a list of birthdates. We first load the dummy database csv file with pandas:
import pandas as pd
database = pd.read_csv("https://medicalprogress.dev/dummy_data/dummy_db.csv")
We make a basic input prompt to ask for the input name and birthdate and we search the second column of the dummy database for a matching birthdate.
# Get input from user
name = input("Provide name: \n")
birthdate = input("Provide birthdate in format dd/mm/yyyy: \n")# Search birthdate in database
name_database = None
for index, row in database.iterrows():
if row == birthdate:
name_database = row
address = row
When we get to a matching birthdate we check the name. We can do this with the Levenshtein distance. In short, the Levenshtein distance is calculated by finding the easiest way to transform one string into another. The Levenshtein distance can be calculated with different protocols. For our purpose, we need the partial ratio. To illustrate why, let’s take names from three different cultures (Dutch/German, Spanish/Portuguese and Japanese-English, see table). Often you will find yourself only able to write down only part of the names accurately in a multicultural community. At other times, names consist of two combined last names of which only one is used for verbally communicating information about a person. If we calculate the outcome of the different protocols, complete string ratio, partial string ratio, sorted token ratio and token set ratio, we get the following outcomes:
Focusing on only a part of the string with the partial ratio protocol seems to eliminate the parts of names that can be confusing, such as second names or other additional words. So the partial ratio seems to be the most helpful. We import fuzzy-wuzzy as fuzz and incorporate the partial ratio of the fuzzy-wuzzy package to do name matching:
# Do fuzzywuzzy comparison
from fuzzywuzzy import fuzz
Str1 = name
Str2 = name_database
partial_ratio = fuzz.partial_ratio(Str1.lower(), Str2.lower())
Finally, we use a cutoff score to minimise the chance of getting the wrong person. Depending on the number of people in your registry you should adjust the cut-off for your application. We incorporate the cut-off in an if statement and return the personal information:
if partial_ratio > 80:
print("During search based on birthdate a name match was found...")
print("Partial string ratio =", partial_ratio)
print("Was", name_database, "with birthdate", birthdate, "living on", address, "the person that you were looking for?")
print("Person not found with birthdate", birthdate, "and name", name)
Testing provides the following conversation:
Provide birthdate in format dd/mm/yyyy:
First name in database that corresponds with input birthdate = Ms. Q. Brown
During search based on birthdate a name match was found...
Partial string ratio = 100
Was Ms. Q. Brown with birthdate 09/08/1939 living on 86 Bankside Drive, Thames Ditton,KT7 0AH the person that you were looking for?
You can find the full code in a notebook on my Github page. Fun things you can try to do are trying to give a more elaborate summary of person his/her characteristics (calculated age or country) to the user or provide an option to the user to choose from matches that have a partial_ratio > 75%.