What is belfry.io
User-generated online discussions can add a real value
to published content. However it's not always easy to enforce
a peaceful, constructive atmosphere. We are working hard on
a system that uses machine learning to identify inappropriate
comments and helps the work of moderators.
Dirty talk, how can I fight thee?
Dirty words. We all know them. They are inappropriate.
They are coarse. They are offensive. We were taught not
to use them, especially before people we don't know, but
we just can't help it. In this blog post we concentrate
on the issues that one should address before building
a system which automatically identifies and categorizes
vulgar language in texts.
Building a training corpus of comments
Machine learning is a powerful method for solving various nlp tasks,
but collecting the training corpus can be a difficult job.
Here's some idea to collect comments when we have a preference
for their topic.