Estimated Time: 20 minutes
Create a list of candidate strings by applying an edit operation
# data
word = 'dearz' # 🦌
Find all the ways you can split a word into 2 parts !
# splits with a loop
splits_a = []
for i in range(len(word)+1):
splits_a.append([word[:i],word[i:]])
for i in splits_a:
print(i)
['', 'dearz']
['d', 'earz']
['de', 'arz']
['dea', 'rz']
['dear', 'z']
['dearz', '']
# same splits, done using a list comprehension
splits_b = [(word[:i], word[i:]) for i in range(len(word) + 1)]
for i in splits_b:
print(i)
('', 'dearz')
('d', 'earz')
('de', 'arz')
('dea', 'rz')
('dear', 'z')
('dearz', '')
Delete a letter from each string in the splits
list.
What this does is effectively delete each possible letter from the original word being edited.
# deletes with a loop
splits = splits_a
deletes = []
print('word : ', word)
for L,R in splits:
if R:
print(L + R[1:], ' <-- delete ', R[0])
word : dearz
earz <-- delete d
darz <-- delete e
derz <-- delete a
deaz <-- delete r
dear <-- delete z
It’s worth taking a closer look at how this is excecuting a ‘delete’.
Taking the first item from the splits
list :
# breaking it down
print('word : ', word)
one_split = splits[0]
print('first item from the splits list : ', one_split)
L = one_split[0]
R = one_split[1]
print('L : ', L)
print('R : ', R)
print('*** now implicit delete by excluding the leading letter ***')
print('L + R[1:] : ',L + R[1:], ' <-- delete ', R[0])
word : dearz
first item from the splits list : ['', 'dearz']
L :
R : dearz
*** now implicit delete by excluding the leading letter ***
L + R[1:] : earz <-- delete d
So the end result transforms ‘dearz’ to ‘earz’ by deleting the first character.
And you use a loop (code block above) or a list comprehension (code block below) to do
this for the entire splits
list.
# deletes with a list comprehension
splits = splits_a
deletes = [L + R[1:] for L, R in splits if R]
print(deletes)
print('*** which is the same as ***')
for i in deletes:
print(i)
['earz', 'darz', 'derz', 'deaz', 'dear']
*** which is the same as ***
earz
darz
derz
deaz
dear
You now have a list of candidate strings created after performing a delete edit.
Next step will be to filter this list for candidate words found in a vocabulary.
Given the example vocab below, can you think of a way to create a list of candidate words ?
Remember, you already have a list of candidate strings, some of which are certainly not actual words you might find in your vocabulary !
So from the above list earz, darz, derz, deaz, dear.
You’re really only interested in dear.
vocab = ['dean','deer','dear','fries','and','coke']
edits = list(deletes)
print('vocab : ', vocab)
print('edits : ', edits)
candidates=[]
### START CODE HERE ###
for token in edits:
if token in vocab:
candidates.append(token)
### END CODE HERE ###
print('candidate words : ', candidates)
vocab : ['dean', 'deer', 'dear', 'fries', 'and', 'coke']
edits : ['earz', 'darz', 'derz', 'deaz', 'dear']
candidate words : ['dear']
Expected Outcome:
vocab : [‘dean’, ‘deer’, ‘dear’, ‘fries’, ‘and’, ‘coke’]
edits : [‘earz’, ‘darz’, ‘derz’, ‘deaz’, ‘dear’]
candidate words : {‘dear’}
You’ve unpacked an integral part of the assignment by breaking down splits and edits, specifically looking at deletes here.
Implementation of the other edit types (insert, replace, switch) follows a similar methodology and should now feel somewhat familiar when you see them.
This bit of the code isn’t as intuitive as other sections, so well done!
You should now feel confident facing some of the more technical parts of the assignment at the end of the week.