A spell corrector in perl6 part 3

NOTE this was originally posted on an older version of this blog, in 2007, the language then known as Perl 6 has been renamed to Raku.

(See part 1 and part 2 for explanations of the following code).

Now, this code should work. If I copy and paste it into pugs it works as expected. If I run it it has some failures in the final tests for correct() that I can’t explain.

The part in the beginning shows how to count different words in a file, it depends on proper handling of unicode in pugs, so it may or may not work at the moment,


sub words($file) { slurp($file).lc.comb(/<alpha>+/) }

sub train(@words) {
  my %res;
  for @words -> $w { %res{$w}++ }
  %res
}


#my %NWORDS = train(words('/home/rff/Desktop/big.txt'));
my %NWORDS={'ciao'=>4,'c'=>3,'cibo'=>1,'ciaao'=>1,'ccc'=>1,'cia'=>1};

my @ALPHA = 'a'..'z';


# 'abc' -> 'ac'
sub deletion($word) {
  (^$word.chars).map: {substr(my $tmp = $word,$_,1)='';$tmp};
}

# 'abc' -> 'adc'
sub substitution($word) {
  gather {
    for (0..$word.chars-1) X @ALPHA {
      substr(my $tmp = $word,$_[0],1)=$_[1];
      take $tmp;
    }
  }
}

# 'abc' -> 'abbc'
sub insertion($word) {
  gather {
    for (0..$word.chars) X @ALPHA {
      substr(my $tmp = $word,$_[0],0)=$_[1];
      take $tmp;
     }
  } 
}

# 'abc' -> 'acb'
sub transposition($w) {
  gather for ^$w.chars {
    my $tmp=$w;
    my $removed =(substr($tmp,$_,1)='');
    substr($tmp,$_+1,0)=$removed;
    take $tmp;
  }
}

sub edits1($w) {
  # all these are different, no need to use a set
  transposition($w),insertion($w),substitution($w),deletion($w)
}

    
sub known_edits2($words) { 
  my @ary = gather {
    for edits1($words) -> $e1 {
      for edits1($e1) -> $e2 {
        take $e2 if %NWORDS{$e2} 
      }
    }
  }
  any(@ary).values
}

sub known(@words) { 
  gather for @words {take $_ if %NWORDS{$_}} ;
}

sub correct($w) {
  my @values = known([$w]) or known(edits1($w)) or known_edits2($w) or [$w];
  # single argument max() doesn't work yet
  say @values.perl;
  @values.max: {%NWORDS{$^a} <=> %NWORDS{$^b}}

}Code language: Perl (perl)

Leave a Reply

Your email address will not be published. Required fields are marked *

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Find out more about Webmentions.)