Technology | | Page 2

2024

Confused About AI? My Notes Can Help!

Artificial Intelligence (AI) is transforming the world around us, influencing industries from healthcare to finance. Recently, I had the opportunity to dive into an AI class, which provided a foundational overview of the core concepts driving this innovative field. Here, I’m excited to share my class notes.

The Birth of AI and Early Challenges

The term “Artificial Intelligence” (AI) first appeared in 1955, coined by American computer scientist John McCarthy. Just a year later, McCarthy played a pivotal role in organizing the Dartmouth Summer Research Project on Artificial Intelligence. This landmark conference brought together researchers from various disciplines and laid the groundwork for the development of related fields like data science, machine learning, and deep learning.

However, these early efforts in AI faced significant hurdles. The computers of the 1950s lacked the capacity to store complex instructions, hindering their ability to perform intricate tasks. Additionally, the exorbitant cost, leasing a computer back then could cost a staggering $200,000 per month!; severely limited access to this technology. Fortunately, advancements in computer technology over the following decades led to significant improvements in processing power, efficiency, and affordability, paving the way for a wider adoption of AI.

As AI systems become more complex and play larger roles in our lives, understanding how they make decisions is just as important as what those decisions are. To help make sense of this, three key concepts often come up: Explainable AI, Transparency, and Interpretability. The table below breaks down these terms in simple language to clarify what they mean and why they matter.

Term	What It Means in Simple Terms	What It Focuses On	When You See It	Example
Explainable AI (XAI)	AI that can tell you why/how it made a decision in a way you can understand	Giving clear reasons or justifications for specific AI outputs	Usually used when AI is complex and needs extra help explaining its decisions	A tool that explains why a loan was denied by highlighting key factors
Transparency	Being open about how the AI system works overall, its data, methods, and design. Transparency can answer the question of ‘what happened’ in the system	Sharing details about the AI’s structure and training process, but not explaining individual decisions	When you want to understand the general workings of the AI, not specific outcomes	Publishing the training data sources and model type publicly
Interpretability	How easy it is for a person to see and follow how the AI made a decision	The simplicity and clarity of the model’s decision-making process itself	Often refers to models that are simple enough to understand directly	A decision tree that shows step-by-step how it classified an input

_{https://link.springer.com/chapter/10.1007/978-3-030-77772-2_17}

What is AI?

Artificial Intelligence is a branch of computer science that deals with the creation of intelligent agents, which are systems that can reason, learn, and act autonomously.

Types of AI: Weak & Strong

When people talk about AI, they usually mean one of two kinds.

Weak AI, sometimes called Narrow AI, is designed to do just one thing well. Think of a GPS app like Google Maps that finds the best route for you or voice assistants like Siri and Alexa that understand simple commands. These systems are really good at their specific tasks but can’t do anything outside of them.

Strong AI, also known as Artificial General Intelligence or AGI, is different. This type would be able to learn and think across many different areas, kind of like a human. It would understand new situations and make decisions on its own, not just follow pre-set instructions. We don’t have strong AI yet, but it’s what many researchers are aiming for, something like the sci-fi idea of a truly intelligent robot or assistant that can help with anything you ask.

AI Breakdown

Artificial Intelligence
- Machine Learning
  - Deep Learning: Deep Learning is a type of machine learning that uses artificial neural networks, inspired by the structure of the human brain. These networks can learn complex patterns from large amounts of data and achieve high accuracy.
    - Deep Neural Networks (DNNs)
      - Inspired by the human brain, they learn from data (like a baby learning a language) to recognize patterns and make predictions.
      - Need lots of data. The more data they see, the better they perform.
      - Highly accurate. Great for tasks like image recognition and speech recognition.
        
        Neural Network Layers/Architectures
        
        The different ways that DNNs can be constructed
        
        Finding the right Layer/Architecture combination is a creative and challenging process.
    - Generative Adversarial Networks (GANs)
      - GANs are a type of deep learning system using two neural networks: a generator and a discriminator.
      - Imagine two art students competing. The generator keeps creating new art pieces, while the discriminator tries to identify if a piece is real or a forgery.
      - Through this adversarial training, both networks improve. The generator creates more realistic forgeries, and the discriminator gets better at spotting them.
        
        Have the potential to be used in defensive & offensive cybersecurity.
      - Example: Apple’s Generative Emojis (Genmoji) https://9to5mac.com/2024/06/10/theres-an-emoji-for-that-meet-genmoji/
    - Diffusion Models
      - Diffusion models are a recent advancement in generative AI specifically focused on creating high-quality, realistic images. They work by learning to remove noise from random noise, essentially reversing a noise addition process.
    - Analogy for understanding DNNs, GANs, & Diffusion models:
      - Think of DNNs as the general tools in a workshop. They provide the foundational capabilities for various tasks.
      - GANs are like specialized sculpting tools. They excel at creating new and interesting shapes (images) but might require more effort to refine the final product.
      - Diffusion models are like high-precision restoration tools. They meticulously remove noise and imperfections to create a clear and detailed image, but the process might take longer.
- - Reinforcement Learning (RL)
  - Natural Language Processing (NLP)
    - Large Language Models (i.e. ChatGPT)
    - Retrieval Augmented Generation (Perplexity AI)
  - Computer Vision (CV)
  - Handwriting Recognition (HWR)

Footnotes

*RL, NLP, CV don’t always require Deep Learning to function.
- But when Deep Learning is used, the power of Deep Neural Networks is applied; which improves the accuracy but require more data and computing power.
  - When Deep Learning is applied, the word Deep is appended in the front (i.e., Deep Computer Vision)
Adding DNNs isn’t a silver bullet to solving all use cases.

Details

Machine Learning

Machine Learning is a sub-field of AI that focuses on teaching computers to make predictions based on data.

There are three key aspects to designing a Machine Learning solution:

Objectives: What do you want the program to achieve? (e.g., spam detection, weather forecasting)
Data: The information the program will learn from. This data can be labeled (supervised learning) or unlabeled (unsupervised learning).
Algorithms: The method the program uses to learn from the data.

Data Types:

Structured Data:
- This type of data is organized and follows a predefined format, like a spreadsheet with clear headings and rows/columns. It’s easily searchable and analyzed by computers.
Unstructured Data:
- This data doesn’t have a fixed format and can be messy or complex. Examples include emails, social media posts, images, and videos.
- While it requires additional processing, unstructured data can be incredibly valuable.
  - Humans primarily communicate using unstructured data, like natural language.
  - Unstructured data is vast and growing rapidly, exceeding the amount of structured data in the world.

AI Features

Before teaching a machine learning model, it’s important to pay attention to the data it learns from the features. How you choose, prepare, and check these features can make a big difference in how well the model works and how fair its decisions are. Here’s a simple breakdown of some key steps involving features and how they can help reduce bias.

Term	Simple Explanation	What It Focuses On	When in ML Pipeline	Relation to Bias Mitigation	Example / Note
Feature Validation	Ensuring features are accurate and consistent	Data quality checks	Early and ongoing data processing	Important for data quality; indirect impact on bias	Checking for missing or incorrect values
Feature Selection	Choosing which data inputs to use in the model	Picking relevant, useful, and fair features	Before or during model training	Helps reduce bias by excluding problematic features	Removing sensitive features like race or gender
Feature Transformation	Changing features into suitable formats or scales	Data preparation like normalization or encoding	Before or during training	No direct bias mitigation; just data formatting	Scaling age values to a 0–1 range
Feature Engineering	Creating or modifying features to improve model	Combining selection, transformation, creation	During feature preparation	Can reduce or introduce bias depending on design	Creating “income-to-debt” ratio feature
Feature Importance	Measuring which features impact model predictions most	Understanding feature influence after training	After training, for interpretation	Does not fix bias; just shows what matters most	Income strongly influences loan approval

Before teaching a machine learning model, it’s important to pay attention to the data it learns from: the features. How you choose, prepare, and check these features can make a big difference in how well the model works and how fair its decisions are. Here’s a simple breakdown of some key steps involving features and how they can help reduce bias.

Why Data is the Most Important Factor for Fairness in AI?

The biggest factor in making AI fair is the data it’s trained on. If the training data doesn’t include enough variety (especially from different groups of people or situations), the AI will likely pick up and even amplify those biases. No matter how advanced the model, how carefully the data is labeled, or how accurate the final predictions are, none of that can fix problems caused by limited or unrepresentative training data.

Think of it like this: AI learns patterns from the data it sees. If that data doesn’t show the full diversity of the real world, the AI will have blind spots and make biased decisions. Other factors can help improve a model, but they can’t make up for training data that doesn’t reflect the real-world variety it needs to understand.

Machine Learning Types

Supervised Learning is like being taught in school. The data you train the model on has labels or pre-defined categories. The model learns the relationship between these labels and the data to make future predictions.
- Example: Classifying emails as spam or important.
Unsupervised Learning is more like exploring on your own. The data you provide has no labels. The model finds hidden patterns or groups within the data.
- Example: Grouping customers based on their shopping habits.

Deterministic AI vs. Non-Deterministic AI

Aspect	Deterministic AI	Non-Deterministic AI
Output	Always the same for the same input	Can vary for the same input
Decision-making	Follows fixed, predefined rules or algorithms	Involves randomness, probabilities, or learning
Examples	Rule-based systems (e.g., Chess engines, SPSS for statistical analysis)	Machine learning models (e.g., ChatGPT, TensorFlow, Scikit-learn)
Predictability	Highly predictable and consistent	Less predictable, can change with the same inputs
Complexity Handling	Best for structured, well-defined tasks	Better at handling ambiguous, complex tasks
Debugging & Explanation	Easier to debug and explain due to clear logic	Harder to debug and explain due to randomness or learning
Learning	Does not adapt or learn unless explicitly reprogrammed	Learns and adapts over time (e.g., through training)
Example Tools	Prolog (for logic-based AI), Calculators, Regex engines	PyTorch, Keras, OpenAI GPT models, AlphaGo
Hallucinations	Unlikely, as outputs are strictly defined by rules and logic	More likely due to probabilistic nature, especially in language models (e.g., ChatGPT)
Strengths	Reliable, consistent, easier to validate	Flexible, adaptable, handles complex and dynamic environments
Weaknesses	Limited by rigid decision-making and lack of flexibility	Can be inconsistent, prone to hallucinations, harder to explain

Common Machine Learning Techniques

Regression: Used for predicting continuous values, like house prices or temperature.
- Real Life Example: Uber uses regression to create a dynamic pricing model. This model considers factors like time of day, demand, and location to predict the optimal price for a ride. This balances customer retention (not setting prices too high) with price maximization (earning as much revenue as possible).
Classification: Sorting things into categories, like spam detection or identifying fraudulent credit card activity.
- Real Life Example: American Express uses classification algorithms to identify potentially fraudulent activities on their credit cards. The algorithm is trained on historical data of fraudulent and legitimate transactions. It analyzes factors like purchase location, amount, and spending habits to flag suspicious activity in real-time.
Clustering: Finding natural groups within unlabeled data, like grouping customers with similar shopping habits.
- Real Life Example: Spotify uses clustering for collaborative and content-based filtering to personalize user experience. Collaborative filtering groups users with similar listening habits and recommends music enjoyed by similar users. Content-based filtering clusters songs based on audio features and recommends songs similar to what a user already enjoys.
Association Rule Learning: Discovering hidden relationships between things in unlabeled data, like recommending movies based on what other viewers with similar tastes watched.
- Real Life Example: Bali Tourism Board uses association rule learning to determine which attraction combinations tourists visit most often and when. By analyzing tourist data, they can uncover patterns like “tourists visiting temples often visit beaches afterward.” This allows for better optimization of infrastructure, staffing, and accommodation availability at different attractions throughout the day.

Reinforcement Learning (RL)

Where a computer is given a problem. It is rewarded (+1) for finding a solution or punished (-1) for not finding a solution.
Unlike supervised learning, the computer (agent) is NOT given instructions on how to complete the task. Instead, it learns through trial and error to learn which actions are good and which are bad.
Similar to how humans learn.
- How babies learn how to walk. When a baby falls over, it feels pain and learns not to repeat the same action again.
Therefore, reinforcement learning is the closest technology we have got in terms of true artificial intelligence.
Reinforcement learning algorithms are better tools to find solutions free from bias and discrimination.
It is adaptable and doesn’t require retraining.
Can learn live online (Spotify/Ecommerce Recommendation)
Difference between Reinforcement Learning and Unsupervised Learning:
- Reinforcement learning is about learning through rewards and punishments to make decisions, while unsupervised learning is about finding hidden patterns or structures in data without any rewards or feedback.
  - Analogy:
    - Imagine you’re observing a dog. In reinforcement learning, you are training the dog by giving it a treat when it sits and saying “no” when it misbehaves. The dog learns which actions lead to rewards.
    - In unsupervised learning, you are not interacting with the dog at all. Instead, you’re just watching a group of dogs and trying to figure out patterns on your own, like which ones behave similarly or belong to the same breed.

Natural Language Processing (NLP)

Field of AI concerned with the interactions between computers and human natural languages.
Focuses on programming computers to process and analyze large amounts of natural language data.
Most complex part of NLP is extracting accurate context and meaning given a natural language.
Involves two main tasks (some applications may require both, while others may only need one):
- Natural Language Understanding (NLU):
  - Maps the given input from natural language into a formal representation and analyzes it.
    - Example: If the input is audio, then speech recognition is applied first. This converts the audio to text, and then the hard part of interpreting the meaning of the text is performed.
- Natural Language Generation (NLG):
  - Process of producing meaningful phrases and sentences in the form of natural language from some internal representation.
  - NLG is generally considered much easier than NLU.
  - It can also convert generated text into speech (e.g., Siri and Alexa).

Generative AI: a broad description of AI that generates data, such as text, images, audio, video, code using some form of AI.

Natural Language Processing (NLP) Use Cases

ChatBots
Analyze survey results
Document review for compliance, legality, typo, etc.
Online & Social Media Content Moderating
Customer online sentiment analysis
Virtual personal assistant (Siri, Alexa)
Review and analyze customer feedback
Language translations
Sentiment analysis & market research
Content creation and modification
Knowledge base creation and easy retrieval

Large Language Models (i.e. ChatGPT)

Focus on developing algorithms capable of understanding, generating, and interacting with human languages.
Their key advantage is that they excel at understanding context over long stretches of unstructured text data.

How do LLMs work?

LLMs are built on layers of neural networks, specifically designed to mimic how humans process and generate language.
They can’t think like humans, but they leverage human language patterns to simulate human-like text generation.
One way they do this is by predicting the next most likely word in a sequence.

How are LLMs trained?

Pre-Training:
- LLMs are exposed to massive amounts of text data from various sources like wikis, blogs, social media, news articles, and books.
- During this process, they learn and practice predicting the next word in sentences.
Fine-Tuning:
- The model is then trained on datasets specific to a particular task, allowing it to apply its capabilities to solve specific business challenges.
- The versatility of LLMs lies in their ability to be customized for a wide range of tasks, from general to highly specific.

Examples of Large Language Models

Model	Advantages	Category
GPT-1	First publicly available GPT (Generative Pre-Trained) model	Text
GPT-2	Significantly increased performance over GPT-1	Text
GPT-3	State-of-the-art performance on many NLP tasks	Text
Jurassic-1 Jumbo	Large and powerful LLM, excels in code generation	Text
GPT-j	Focused on translation tasks, excels in multilingual translation	Text
DALL-E 2	Generates high-quality and creative images	Image
Midjourney	Generates high-quality and creative images. It utilizes Discord as its interface for generating AI art.	Image
Stable Diffusion	Generates high-quality and creative images	Image
BERT	Excellent for text understanding and question answering	Text
Bard	Large language model from Google AI, similar to LaMDA	Text
LaMDA	Focuses on dialogue applications, can be informative and comprehensive	Text

Prompt Engineering: crafting ideal inputs in order to get the most out of large models.

Token: A unit easily understood by a language model. One word can be made up of multiple tokens. Example:

Words	Tokens
Everyone	[Every, one]
I’d love	[I, ‘d, love]

Tokenization: The mechanism by which a model splits its inputs into tokens. A given method can greatly affect a model’s output. Large Language Models take an input and produces a token as output. The model uses its training from vast text sources to predict what words likely follow the input tokens.

Retrieval-Augmented Generated (i.e. Perplexity AI)

Improves the accuracy and reliability of the traditional large language models (LLMs) by incorporating information from external sources.
It is ideal for situations where the underlying data changes frequently and the system needs to generate tailored outputs based on the latest information.
- Traditional LLMs generate text based solely on the input they receive and their learned parameters. They do not directly retrieve external information during the generation process.
External Knowledge Base: RAG integrates the LLM with an external source of reliable information, like a specialized database or knowledge base. This allows the LLM to access and reference factual information when responding to prompts or questions.
- In a way, attempts to combine LLM capabilities with traditional Search Engine (i.e. if ChatGPT & Google had a child)
Improve Response Generation: When you ask a question, RAG first uses the LLM to understand your intent. Then, it retrieves relevant information from the external knowledge base and feeds it back into the LLM. Finally, the LLM uses this combined knowledge to generate a response that is both comprehensive and accurate.
Multimodal Generative AI: Involves multiple data types (text, images, audio).
Expert System: Rule-based systems with static logic; less flexible for dynamic, frequently changing information.

RAGs Help LLMs by:

Be more factually accurate
Stay up-to-date with current information
Provide users with a better sense of where their answers are coming from

Challenges:

LLMs: LLMs generate text based on the patterns and associations learned from vast amounts of text data. Therefore, the main challenge with LLMs lies in their potential to generate incorrect or misleading information, especially in scenarios where the training data is biased or incomplete.
RAGs: While RAG addresses some of these issues by leveraging external knowledge, it introduces challenges related to the retrieval process itself(ethical, violations of terms), such as ensuring the retrieved information is accurate, up-to-date, and relevant to the context of the generated text.

Computer Vision (CV)

There are 2 types of Computer Vision algorithms:

Classical CV: excels at specific tasks like object detection (identifying cats or dogs) with high speed and accuracy.
Deep Learning CV: used when classical methods don’t provide enough power for complex tasks. Deep Learning algorithms can learn intricate patterns from vast amounts of data, enabling them to tackle more challenging computer vision problems. Examples:
- Facial Recognition: Deep learning algorithms can analyze facial features with high accuracy, enabling applications like unlocking smartphones with your face, identifying individuals in security footage, or even personalizing advertising based on demographics.
- Self-Driving Cars: Deep learning is crucial for self-driving cars to navigate complex environments. These algorithms can process visual data from cameras in real-time, allowing the car to identify objects like pedestrians, vehicles, and traffic signals, and make decisions accordingly.
- Medical Image Analysis: Deep learning can analyze medical images (X-rays, MRIs) with impressive accuracy, assisting doctors in tasks like cancer detection, anomaly identification, and treatment planning.
- Object Detection and Tracking: While classical CV can handle basic object detection, deep learning excels at identifying and tracking a wider range of objects in real-time. This is valuable for applications like surveillance systems, sports analytics, and autonomous robots.
- Image Segmentation: Deep learning can segment images into specific regions, allowing for a more granular understanding of the content. This is useful in applications like autonomous farming (identifying crops vs weeds), self-checkout systems (differentiating between items), and augmented reality (overlaying virtual objects on real-world scenes).

Handwriting Recognition (HWR)

Imagine you have a letter written by your friend, but instead of typing, they used pen and paper. Handwriting Recognition (HWR) is like a magic decoder for that letter. So, HWR takes your friend’s written letter (handwriting) and turns it into something a computer can understand (recognition). It basically translates the scribbles into text you can read on a screen!

Example: Apple Math Notes (https://mashable.com/article/ipados-18-math-notes)

What is AI Architecture?

AI architecture is basically how an AI system is built—how it learns, thinks, and makes decisions. Think of it like the blueprint or design plan of a machine. Different architectures use different methods to understand and work with data.

Core Types of AI Architectures

Category	Examples	Description
Symbolic AI	Expert Systems, Logic Rules	Uses predefined logic and rules, no learning. Good for explainability.
Statistical / Classical ML	Decision Trees, Random Forests, SVM	Learns from data using traditional algorithms. Structured, tabular data.
Neural Networks (Deep Learning)	CNNs, RNNs, Transformers, LLMs	Learns patterns from unstructured data (images, text). Highly scalable.
Generative Models	GANs, VAEs, Diffusion Models, LLMs	Learns to generate new data similar to what it was trained on.
Hybrid / Neuro-symbolic	Symbolic + Neural (e.g., Logic + LLM)	Combines rule-based reasoning with learning-based perception.
Multimodal AI	Vision-Language Models (e.g. GPT-4V, CLIP)	Integrates multiple input types (text, image, audio).
Retrieval-Augmented Generation (RAG)	LLM + Knowledge Base	Uses search to retrieve facts, then generates answers. Reduces hallucinations.
General-Purpose AI (GPAI)	Not yet fully realized (AGI)	Hypothetical AI that can perform any cognitive task. LLMs are precursors.

What About Auxiliary Tools?

These are tools and software that help you build, run, and manage AI systems. They’re not AI themselves but are essential to making AI work in the real world. Some examples:

Tool Type	Examples	Purpose
Frameworks	TensorFlow, PyTorch, Scikit-learn	Build and train models
Data Processing	Pandas, NumPy, Apache Spark	Prepare data for AI
Model Ops / Deployment	MLflow, Kubernetes, SageMaker	Manage lifecycle of models
Monitoring	WhyLabs, Evidently AI, Arize	Track model performance
Explainability	SHAP, LIME	Make AI decisions interpretable
Data Labeling	Labelbox, Prodigy	Annotate training data
Embedding Stores / Vector DBs	Pinecone, FAISS, Weaviate	Power retrieval in RAG systems
Prompt Engineering / LLM Toolkits	LangChain, LlamaIndex	Build apps on top of LLMs

This is just the tip of the iceberg when it comes to AI. There’s a whole world out there, and I’m just getting started exploring it myself. I hope to share future notes on any additional learning. Feel free to reach out if you have any questions or comments.

AI Architecture = the structural design of how an AI system thinks and learns.

Auxiliary Tools = supporting tech used to build, deploy, interpret, and scale AI systems.

AI Stack Layers (Simplified)

Platforms and Applications (Top Layer)
This is where people interact with AI. It includes things like:

Cloud services (AWS SageMaker, Google AI Platform)
Apps powered by AI (chatbots, recommendation systems)
APIs that let developers plug AI into their own software

Think of this as the user-friendly interface and infrastructure that make AI accessible and useful. It’s above the actual AI models and tools, it wraps everything so users can easily use the AI.

Model Types (Middle Layer)
These are the actual AI architectures: the brains behind the scenes. Examples:

Neural networks (including transformers)
Decision trees and random forests
Expert systems (rule-based AI)
Generative models

This layer does the learning and decision-making. It takes data and figures out patterns or generates new content.

Auxiliary Tools (Supporting Layer)
These help build, train, deploy, monitor, and explain the AI models. Examples:

Frameworks like PyTorch or TensorFlow
Data processing libraries
Monitoring and explainability tools
Vector databases for retrieval

What Does It Mean for AI to Be Robust?

A robust AI system holds up well when things get messy in the real world. It keeps doing its job correctly even when faced with noisy data, weird inputs, or deliberate attempts to throw it off course.

Example: Think of a robust self-driving car that stays safe on the road despite sudden downpours, partially covered street signs, or unexpected construction work. It handles these curveballs without missing a beat.

This isn’t the same as:

Reliable: A reliable system is consistent but only under normal conditions. It’s like that friend who’s always on time when everything goes according to plan, but falls apart when complications arise. Example: A reliable car performs perfectly on a clear day with perfect road markings, but might get confused during a heavy fog or when facing unusual traffic patterns.

Resilient: A resilient system bounces back after problems but doesn’t necessarily stay steady during the rough patch. Example: A resilient car might temporarily shut down certain functions when it detects a problem, then restore normal operation once conditions improve. It recovers well, but doesn’t necessarily power through difficulties.

If you’re interested in developing an AI Policy for your organization, then check out my guide: https://github.com/azeemnow/Artificial-intelligence/blob/main/AI-Policy-Development-Guide/AI-Policy-Development-Guide.md

Disclaimer: Various AI tools were used to assist with research and content writing for this blog. While every effort has been made to verify the accuracy of the information presented, some details may evolve as AI technology advances. Readers are encouraged to consult additional sources for the most up-to-date information.

References:

Tagged AI, Artificial Intelligence, ChatGPT, Computer Vision, Deep Learning, Deep Neural Networks, LLM, Machine Learning

2019

Guide, Info Security, Original, Review, Security Blogs, Technology

Free web application vulnerability software

The goal of this post is to provide an overview of an awesome OWASP project which is designed to find vulnerabilities in web applications called: Zed Attack Proxy (ZAP). I have known about ZAP for a while but I just thought I do a quick write up.

ZAP was selected as the second top security tool of 2014 by ToolsWatch.org. The project is extremely well documented with a user guide, FAQs, tutorials, etc., all conveniently located on its wiki. Also, since there is already so much professional documentation available for this project, this post will not pay too much attention to its features and functionality, but rather on my experience with the tool and how I got it up and running.

ZAP can run on Windows, Linux, and OS/X, and it can be downloaded from here. I downloaded ZAP on my Ubuntu 13 Desktop instance. Note that Java version 7 is required for both Windows and Linux. Also, ZAP comes included in several security distributions — a list can be found here.

After you have extracted the ZAP_2.3.1_Linux.tar.gz, you just need to run the zap.sh:

Soon after that, the application will auto-start. You may be prompted to generate an SSL certificate — which you will need in order to test secure applications — however, I skipped that initially since you can always come back to it.

The last step in the installation process is similar to BURP and that is to configure your browser to use ZAP as a proxy. The ZAP team has a nice guide here on how to do this for the most common browsers. I set Firefox with ZAP proxy:

After completing the step above, you are done with the installation process and are ready to kick off a scan. Here is how the home page should look like.

The first thing I would like to call your attention to before setting up a scan is to please make sure you have explicit permission before you scan any site. It is best to deploy a dummy web application on your local machine and use that to scan and learn.

If you have questions about where to start in ZAP, the perfect place to start would be the awesome user guide that comes with the installation. It can be accessed from Help > OWASP ZAP User Guide:

I believe everything that is found on ZAP’s online wiki can be located in this user guide, if not more. I think that is great because as you look through the home page and menu options, it can be a bit overwhelming. But you can find answers to what all of the buttons do from the user guide as well as from here and here.

Going back to the homepage, you will see the following option:

This is probably the best place to start off with your first scan. Alternatively, you could visit your demo site using the browser on which you configured ZAP proxy, and as you navigate through the site, ZAP will begin to populate the structure on the left home-page panel:

After you have the site structure similar to the above, you can take your test in several different directions — most of which can be viewed by simply right-clicking on any of the site’s pages:

If you are fairly new to web application security (like I am) chances are that whichever direction you choose to take, you will have questions. Fortunately, there are YouTube videos that you can refer to here. One video in particular that you should check out is this as it can come in handy when you want ZAP to auto-authenticate to your site’s login fields.

This concludes the introduction of a feature-packed tool from a long list of tools that I plan to explore. This already looks to be the best of the bunch. Even if you just heard of web application security, and you are looking to try one, this is a must-have for you; and it’s free! I am really glad that I got the chance to play with this tool and now it is part of my toolkit. I recommend that you check it out to begin rockin’ on your Web Application Security game!

Follow me on Twitter: @azeemnow

Tagged Application Security, AppSec, OWASP, Penetration Testing, Proxy, WebApp, ZAP

2019

How Free Web Filtering Software Can Protect You System?

Update

On August 1, 2016, Blue Coat, Inc. (K9’s parent company) was acquired by Symantec™. As can be imagined Blue Coat and Symantec had a handful of similar products and unfortunately, it didn’t make sense to maintain two competing products. it was decided to “end-of-life” K9 Web Protection.
Effective immediately, K9 Web Protection is no longer available for purchase or download. Technical Support for K9 will end on June 30, 2019.

It is unfortunate to see K9 Web Protection go. I am not aware of an alternative free software that provides the same level of protection at a premium quality. However, for those interested in alternatives to K9 Web Protection, I would recommend you can start with Quad9 and OpenDNS Home. While neither of them provides everything that K9 did, but they still protect your system against most common online threats.

“We may think one layer of security will protect us – for example, antivirus. Unfortunately for that approach, history has proven that, although single-focus solutions are useful in stopping specific attacks, the capabilities of advanced malware are so broad that such protections inevitably fail.” – Jerry Shenk, Layered Security: Why It Works.

Making use of layered security for personal use is of the utmost importance as I have covered a couple of times in the past: here, here, and here. Just as I have done in the past, I will use this post to share another tool that you can explore to support your personal layered security strategy.

My never-ending curiosity to explore and test new technologies can sometimes lead me to stumble upon genuinely impressive solutions. Fortunately for you, I believe this tool falls into that category.

K9 Web Protection is the software that I have been testing for some months now, and I must say, I’ve been truly pleased with its results. The software falls under the Web Filter category, which places a restriction on websites that you can visit. Web Filtering is used in two major cases. The first is to permit parents to control the sort of content accessible to their children, offering their kids a safe environment to learn and explore online. The second is for businesses who wish to prevent their employees from accessing websites that do not pertain to their jobs.

However, in addition to the above-mentioned, from my experience using this software on a daily basis, I have come across other benefits:

Real-time malware protection“helps identify and block illegal or undesirable content in real time, including malware-infected sites. You also benefit from the WebPulse cloud service, a growing community of more than 62 million users who provide more than six billion real-time Web content ratings per day.”
- You can learn more about web filtering and intelligence here.
Automatic content ratings“New websites and web pages are created every minute, and no one person can possibly rate or categorize all of them. To ensure protection against new or previously unrated websites, Blue Coat’s patent-pending Dynamic Real-Time Rating™ (DRTR) technology automatically determines the category of an unrated web page, and allows or blocks it according to your specifications.”

Another advantage of the K9 Web Protection is that it is backed by Blue Coat (acquired by Symantec in 2016), the leader in Web Security “with an impressive portfolio of integrated technologies serving as a trusted platform to deliver Cloud Generation Security to more than 15,000 customers worldwide.”

This solution is truly an “enterprise-class security software designed for home computers.” Also, did I mention that it’s free! “As part of the Blue Coat Community Outreach Program, K9 Web Protection is free for home use. You can also purchase a license to use K9 Web Protection for business, government, non-profit, or other use.”

I will do a quick overview of the installation and usage of the software, but you can find a well-documented quick start guide and user manual here:

Installation and Usage Overview:

At the time of writing, K9 Web Protection is available on Microsoft Windows 7 and higher and Mac OS X 10.8 (Mountain Lion) and higher.
Support for iPhone, iPod Touch or iPad is also available—search for “K9” in App Store or iTunes: https://itunes.apple.com/app/k9-web-protection-browser/id407657840
A license key is required (for the free and premium version) and can be obtained from: http://www1.k9webprotection.com/get-k9-web-protection-free
The installation file for OS X and Windows can both be downloaded from: http://www1.k9webprotection.com/getk9/download-software

installk9

The installation process should take a couple of minutes to complete as it is self-explanatory.
Upon completion, the application’s interface will open in your browser:

K9_Browser_admin_page

To view or modify any of the configurations, you will be prompted to enter the password you created during installation.
Here are some of the options and details you can access from the Setup page:

Web Categories to Block: choosing one of the available levels allows you to block selected categories of websites.
Time Restrictions: 3 options are available to block web access depending on the time of day. Unrestricted places no restrictions on web access. NightGuard blocks all web access during contiguous blocks of time every day. Custom enables you to choose days of the week and time periods to block all web access.
Web Site Exceptions: Allows you to create lists of websites to “always block” or “always allow.” Blocking Effects: “Bark When Blocked” plays a barking sound when a web page is blocked. Make sure the sound is enabled and not muted. Show Admin Options displays options on blocked web pages which enable administrators to view the blocked web page. Enable Time Out allows you to block all web access if too many web pages are blocked in a given period of time
URL Keywords: Allows you to enter keywords which, if found in a URL, cause a “block page” to display. Safe Search: “Redirect to K9 Safe Search” will redirect searches to various search engines through K9’s Safe Search. This provides a safer search experience than other search engines provide. Force Safe Search will prevent users from disabling Safe Search functionality provided by various websites.
Other Settings: “Update to Beta” enables you to get advance copies of new K9 Web Protection software undergoing development. Blue Coat distributes Beta versions so that K9 gets used in “real world” environments before being released as a final version. Please note that Beta versions might be incomplete and less stable than final versions. “Filter Secure Traffic” enables K9 to block secure websites (i.e. sites that use the HTTPS protocol).
Password/Email: Allows you to change your K9 administrator password or e-mail address.
K9 Update: Installs software updates if available.
View Activity Summary: This tab shows a summary of all “Web Activity” on your computer: To view more details, click the “Category” or “Requests” links. On these pages, you have the option of grouping the data by month or by day. To view Administrative Events details, click the “View All” link. (Some of these activities are as a result of automatic browser and toolbar updates, for example, and might display URL formats with which you are not familiar.) By selecting “Clear Logs”, all your activity data will be cleared; however, three days’ worth of administrative events will be retained.

As you can see from the above, the information provided here is extremely granular and it allows you to not only get an easy view of your browsing behavior but also the behaviors of the various system and application components. I have been using this solution in conjunction with other traditional protective mechanisms, such as anti-virus, and the benefits have been massive.

For instance, sometimes, while surfing the internet, I would see a certain URL get blocked or a visit history to a certain category in a website without a recollection of visiting that website. However, after investigations, I found that some components of a software installed on my computer or an extension in my browser is the reason behind that activity.

“The malware ecosystem has changed drastically in the past 10 years, to the point that the old precautions are just no longer enough” – Malwarebytes LABS. I have been using K9 Web Protection on many of my personal computers because I have been impressed with it, so I thought to share it here. I believe it provides that extra layer of protection that we can all appreciate in a world where cyber threats are on the rise. In addition, I believe this solution is a wonderful option for those that are less familiar with common cyber threat vectors (i.e. parents) and can easily fall for phishing emails or click on an adware as they browse the internet.

As we have known for some time, “there is no single solution for the information security problems we face today. A combination of many different kinds of security tools is required to protect you from modern threats…” and I believe K9 Web Protection is among the best tools we have today, so you should definitely equip yourself with it if you are going to create a safe web environment for yourself, your kids, your employees, and everyone around you!

Tagged antivirus, Cyber Security, free antivirus, information security, Malware, opendns, Quad9, spyware, technology, web filtering software free

2018

Start-up Security Guide – DIY Style

photo-1585144499819-651e1c1c97ec

Inspired by this blog by Isaiah Sarju and this presentation given during the 2017 Denver Startup Week, I am sharing my own version: A DIY (do it yourself) Cybersecurity Guide for Startups!

This guide includes some of my favorite resources that I believe can serve as a great starting point for founders to use and build a strong security foundation for their startups.

Please make sure you check-out Isaiah’s post and the Denver presentation above; both of these are extremely thoughtful and valuable pieces!

Category	Resources
Start Here	Security Planner, DIY Cybersecurity, Take-Five (financial fraud focus), APWG, SSD
Multi-Factor Authentication Availability	TwoFactorAuth
Password Manager	Quick Guide, Password Strength Test, Identify Compromised Account
Browser Extensions	Privacy Badger, HTTPS Everywhere,
Application Security	OWASP, Checklist/EBooks, Secure Coding Course, DIY Hack
Sensitive Info Sharing	Wire, Wire’s Audit, Signal, Signal’s Audit
System Encryption	PC, MAC: Src1, Src2 Portal Media: Src1, Src2
OS Update	PC, MAC
VPN	Background, Comparison
Separate Work & Personal on a Budget	VirtualBox, VMWare Player, Workstation Pro, MAC Fusion, Trial Virtual Machines, Live OS
The principle of Least Privilege	Windows 10, Windows 7, MAC OS
Backup Everything	PC, MAC
Who’s Watching	Privacy Screens, Webcam Covers
Prevent Accidental Data Exchange	SyncStop
Report Abuse / Take Down Request	AWS, Azure, Google Cloud, Salesforce, Cloudflare
Check/Request Domain Category	Google, Windows Defender, Norton, Symantec, McAfee, Palo Alto, Web of Trust , Easily Report Phishing and Malware
Internet Crime Complaint Center	IC3
Public Security Page	Security Page
Phishing Report	APWG
Security Education/Awareness	Stop.Think.Connect, Interactive Game, Safe Online,
Sector-based Information Sharing and Analysis Centers	ISACs
Cyber Readiness Index by Country	CRI

Report Google: to report an incorrect marked page as phishing to Google: https://safebrowsing.google.com/safebrowsing/report_error/?hl=en

FREE Cyber Security for Small Business Owners Training by Heimdal

If you found this helpful please let me know by sending me your comment and feedback below!

I plan to keep this a live list so if you know of a resource that is not already listed but will benefit others, feel free to share and I will make sure to include it!

Also, as you may know, Phishing remains as the most common tactic used by attackers to compromise both companies and individuals.
“Three out of ten people will open a phishing email while one of those will proceed to click on the link, possible infecting not only their own computer but the whole firm”. – Ref.

As part of this post, I am offering a practical, hands-on training on how you can triage and respond to Phishing attacks to protect yourself, your employees and ultimately your company.

Complete the form below and let me know if you would like to learn more!

← Back

Thank you for your response. ✨

Tagged Cyber Security, entrepreneur, security, startup

Cybersecurity | Digital Privacy | AI | Technology | Business| Leadership

Category Archives: Technology