NOTE this was originally posted on an older version of this blog, in 2007, the language then known as Perl 6 has been renamed to Raku.
(See part 1 and part 2 for explanations of the following code).
Now, this code should work. If I copy and paste it into pugs it works as expected. If I run it it has some failures in the final tests for correct() that I can’t explain.
The part in the beginning shows how to count different words in a file, it depends on proper handling of unicode in pugs, so it may or may not work at the moment,
sub words($file) { slurp($file).lc.comb(/<alpha>+/) }
sub train(@words) {
my %res;
for @words -> $w { %res{$w}++ }
%res
}
#my %NWORDS = train(words('/home/rff/Desktop/big.txt'));
my %NWORDS={'ciao'=>4,'c'=>3,'cibo'=>1,'ciaao'=>1,'ccc'=>1,'cia'=>1};
my @ALPHA = 'a'..'z';
# 'abc' -> 'ac'
sub deletion($word) {
(^$word.chars).map: {substr(my $tmp = $word,$_,1)='';$tmp};
}
# 'abc' -> 'adc'
sub substitution($word) {
gather {
for (0..$word.chars-1) X @ALPHA {
substr(my $tmp = $word,$_[0],1)=$_[1];
take $tmp;
}
}
}
# 'abc' -> 'abbc'
sub insertion($word) {
gather {
for (0..$word.chars) X @ALPHA {
substr(my $tmp = $word,$_[0],0)=$_[1];
take $tmp;
}
}
}
# 'abc' -> 'acb'
sub transposition($w) {
gather for ^$w.chars {
my $tmp=$w;
my $removed =(substr($tmp,$_,1)='');
substr($tmp,$_+1,0)=$removed;
take $tmp;
}
}
sub edits1($w) {
# all these are different, no need to use a set
transposition($w),insertion($w),substitution($w),deletion($w)
}
sub known_edits2($words) {
my @ary = gather {
for edits1($words) -> $e1 {
for edits1($e1) -> $e2 {
take $e2 if %NWORDS{$e2}
}
}
}
any(@ary).values
}
sub known(@words) {
gather for @words {take $_ if %NWORDS{$_}} ;
}
sub correct($w) {
my @values = known([$w]) or known(edits1($w)) or known_edits2($w) or [$w];
# single argument max() doesn't work yet
say @values.perl;
@values.max: {%NWORDS{$^a} <=> %NWORDS{$^b}}
}Code language: Perl (perl)