🧙‍♂️
CoolerVoid tavern
  • Whoami
  • Hidden firewall in Kernel
  • Ghost in the file system
  • Detecting heap memory pitfalls
  • WAF from the scratch
  • 0d1n web fuzzing
  • Hacking on the TV remote control
  • Improve spam detection
  • Port knocking from the scratch
  • String comparison with SSE4.2
  • Arithmetic pitfalls and dragons
  • The magic of bits "Bitwise"
  • Library Application Firewall
  • Solitude & data structures
  • Nmap's CPE to nvd CVE
  • Audit operational system libs
  • Firefox tunnel
  • L33t Brazilian tools
  • 🤖Tricks
    • Linux tips
      • Restrict syscalls with seccomp
Powered by GitBook
On this page
  • The story tale
  • Motivation open-source
  • From the scratch
  • My new library to work with ML + NLP in C++
  • Presentation
  • References

Improve spam detection

Machine learning to detect anomaly - date: 01/01/2014

PreviousHacking on the TV remote controlNextPort knocking from the scratch

Last updated 2 years ago

The story tale

In the last year following the search(2012), I searched something about machine learning, like trying to detect SPAMs at my private projects. I saw something about KNN, random decision forests and naive Bayes. So I chose because Naive Bayes is one of the simplest classifiers, based on Bayes theorem with naïve and complete independence assumptions. It is one of the most basic text classification techniques with various email spam detection, document categorization, sexually explicit content detection, personal email sorting, language detection and sentiment detection().

Despite this technique's naïve design and oversimplified assumptions, Naive Bayes performs well in many complex real-world problems. Another good thing, Naive Bayes is suitable for limited CPU and memory resources.

Motivation open-source

The motivation for writing an open-source software program to detect spam using the classification of text inputs is to create a tool that can be used and improved by anyone. By making the software open-source, you can allow anyone to access the source code, use it, modify it, and distribute it freely. This can encourage collaboration and innovation, and it can help you build a community of users and contributors who can help improve the software and make it more valuable and practical.

The benefits of open-source software for detecting spam include improved reliability, security, and flexibility. By allowing anyone to access and modify the source code, you can ensure that the software is being constantly tested and improved by a wide range of users. This can help you identify and fix bugs and security vulnerabilities more quickly, and it can help you incorporate new features and improvements more easily. Additionally, by allowing users to customize the software to their specific needs and preferences, you can make it more valuable and effective for a broader range of applications and use.

From the scratch

The motivation for writing a program to classify text inputs using Naive Bayes with maximum-likelihood estimation, Laplace smoothing, and TF-IDF vectorized matrix is to create a highly accurate and robust spam filter. By using these techniques, you can create a model that can accurately classify emails as spam or not spam, and you can improve the performance of the model by using MLE, Laplace smoothing, and TF-IDF vectorization to handle different types of data and different types of spam.

The benefits of such a program are numerous. By using a highly accurate and robust spam filter, you can help protect your email inbox from spam, phishing attacks, and other types of unwanted or malicious messages. This can save you time, money, and frustration, and it can help you avoid the negative consequences of spam, such as lost productivity and security breaches.

Additionally, by using a program that uses assisted training, you can make it easy for users to train the model and improve its accuracy. This can help you build a large and diverse training dataset that can better represent the types of emails that users receive, and it can help you create a more effective spam filter that can adapt to new and changing threats.

Overall, the benefits of a program for the classification of text inputs using Naive Bayes with MLE, Laplace smoothing, and TF-IDF vectorization includes improved accuracy, performance, and user experience, which can be valuable for a variety of applications and contexts.

My new library to work with ML + NLP in C++

Presentation

If we view a presentation on slide number 12, we can see my point of view about ranking to optimize the accuracy of the classifier at results.

References

  • John, G. H. e Langley, P. (1995). Estimating continuous distributions in bayesian classifiers. Montreal, Quebec; Canada.

  • Svore, K. M., Wu, Q., e Burges, C. J. (2007). Improving web spam classification using rank-time features. Banff, Alberta, Canada.

Thank you for reading this! Cheers!

Consequently, I wrote and some slides for a presentation, which we can view at the end of this blog post. To optimize detection accuracy, I use to match patterns and put each mark in the ranking. That ranking has one classification. We can view the following code. To make our automaton, we can use Flex, bison in another way.

from SO, This is a very cool trick to gain accuracy. No more words, folks.

a C++ library to classify texts
DFA(deterministic finite automaton)
here
Improving spam detection with automaton
Antonio Costa aka Cooler_
Natural Language Processing by Dan Jurafsky, Christopher Manning
Naive Bayes
i think something like NLP
dilbert.com