Python Basics for Pentesters
The entirety of this guide was written by Martian Defense, LLC
Introduction to Python
Python is a high-level, interpreted, and general-purpose dynamic programming language. It is often used as a scripting language because of its forgiving syntax and compatibility with a wide variety of different systems and libraries. In this section, we will review the basic building blocks of Python, from understanding the Python interpreter, syntax, to building simple functions, classes, and modules. These fundamentals will serve as your foundation in using Python for penetration testing.
Python is an interpreted language, which means that it is processed at runtime by the interpreter. You can use the Python interpreter in two ways:
Interactive mode
Script mode
Interactive mode
In interactive mode, you type Python code and the interpreter displays the result:
Script mode
In script mode, you store code in a file and use the interpreter to execute the contents of the file:
Syntax
Python was designed to be easy to understand and fun to use. Its syntax is clear and it has a distinct style guideline, PEP 8, which promotes readability.
Variables and Data Types
Python has several fundamental data types:
Numeric Types: int, float, complex
Boolean Type: bool
Text Type: str
Sequence Types: list, tuple, range
Mapping Type: dict
Set Types: set, frozenset
You can assign any data type to any variable:
Nonprimitive Data Types
List: Any data that is enclosed within square brackets (
[ ]
) and separated by a comma is considered a list. In Python, the objects in a list are indexed, where the first object in the list starts with index 0, the proceeding object is 1, and so on.Tuple: Any data that is enclosed within parenthesis (
( )
) and separated by a comma is considered a tuple. Tuples are immutable, meaning that data is stored in a tuple cannot be modified at run time. The following is a tuple of IP addresses.Dictionary: Any key-value pairs that are enclosed in curly brackets (
{ }
) and separated by a comma is a dictionary. The following dictionary has keys that are equal to interface names and values with the desired state of the interface.Set: A collection of unique objects that is enclosed curly brackets (
{ }
) and separated by a comma is considered a set.
Primitive data types can be converted to other primitive types using built-in functions, assuming that the converted value is valid.
Some nonprimitive data types can also be converted to other (similar) data types. For example, a list can be converted to a set or a tuple but cannot be converted to a dictionary.
As shown in the output, the devices list contains a duplicate entry of "NEXUS", but it was removed when converted to the set data type. Note that set removes items based on case sensitivity. The previous output shows that 'ASA' and 'asa' are still present in the set because they are different values.
It is possible to convert a list to a tuple, but the variable that holds the converted tuple can no longer be modified. Here you can see an example:
It is not possible to convert one data type to another if the converted value is invalid. For example, an error will be raised if you attempt to convert a list to a dictionary. Here is an example of such an error:
Each data type that was mentioned previously supports different built-in methods and attributes. To list the methods that can be used on a particular data type, create a variable with that type and then issue the dir()
built-in method. Here you see an example of methods that can be used on the string data types.
If you want to convert a string to all capital letters, you need to use the upper() method.
To learn how to use each method, you can use the help()
function.
Now you have an understanding of the different data types that Python supports. You can expand that knowledge with the following slightly more advanced topic: nested nonprimitive data types.
Nested data structures are only applicable to nonprimitive types. Each nonprimitive type can contain the same or other nonprimitive data types as nested entries.
For example, a nested list can contain a dictionary as a nested item:
To access the value inside the nested list without looping through the list, you first need to identify its position in the list. Because lists are ordered, and the positions, starting from 0, are incremented from left to right by 1, the position for the nested dictionary will be 1.
Now, to refer to the position, you need to put an integer within the square brackets:
However, that will only give you the [{"state": "shutdown"}] item, because it is also a list, and has only one value that can be referenced with its positional number:
At this point, what remains is a dictionary. Now you can print the value of the key by appending the name of the key to the variable that precedes the position in the list:
Now, consider a nested dictionary. As you know, dictionaries are not sorted, so the position of a key cannot be referenced. Instead, it can be referenced directly by specifying its name and enclosing it in square brackets. Here is an example:
You can obtain nested values that are stored under each root key. In this example, start with csr1kv1. To return the value stored in the key, you enclose the key’s name in square brackets and append it to the variable.
The returned value is another dictionary. To return the value of the nested key, you need to add its name after the name of the root key.
Now consider the second root key. As you can see, the value is a list, so you need to act accordingly. You need to obtain the value, pick the position within the list, and then use the key name to return the value.
Understanding how to find the position in nested lists is crucial in day-to-day programming, especially when dealing with API calls.
Control Structures
Control structures determine the flow of your program. They include conditionals (if, elif, else) and loops (for, while).
Functions
A function is a block of code which only runs when it is called. Functions provide better modularity for your application and a high degree of code reusing. You can define functions using the def keyword:
Classes
Python is an object-oriented language and classes provide a means of bundling data and functionality together. Creating a new class creates a new type of object, allowing new instances of that type to be made:
Modules
A module allows you to logically organize your Python code. Grouping related code into a module makes the code easier to understand and use:
You can then import this module in another script:
Network Programming
In penetration testing, network programming is a crucial skill. With Python, you can write scripts that can sniff network packets, perform network scans, and execute other related tasks. In this section, we will review the fundamentals of network programming in Python.
Socket Programming
A socket is one endpoint of a two-way communication link between two programs running on a network. Python provides a robust library, socket, which provides us with socket operations. Here's a basic server-client program example:
Server:
Client:
TCP and UDP Connections
There are two main types of Internet protocol (IP) traffic, and they are TCP (Transmission Control Protocol) and UDP (User Datagram Protocol). In Python, we can set up both types of connections with the socket module.
TCP Connection
TCP is a reliable connection-oriented protocol that guarantees the successful delivery of data. Here's a basic TCP client-server program:
Server:
UDP Connection
UDP is not a connection-oriented protocol. Unlike TCP, it doesn't confirm whether the data reached the receiver or not. Here's a basic UDP client-server program:
Server:
Client:
Network Scanning
Network scanning is a process of identifies active hosts (clients and servers) on a network and their ports. In Python, we can use the socket module to perform this task. For instance, the following script checks for open TCP ports on a target host:
Web Scraping
Web scraping is a technique to extract data from websites. It involves making HTTP requests to the URLs of specific websites and then parsing the response (HTML) to pull out the information you need. Python provides several libraries to simplify web scraping, including requests
for making HTTP requests and BeautifulSoup
for parsing HTML.
HTTP Requests
The first step in web scraping is to send a HTTP request to the URL of the webpage you want to access. When a browser sends a request to a server, it's basically asking that server to send it a webpage. In Python, the requests
library is commonly used for making HTTP requests.
The following example demonstrates how to use requests
to make a GET request:
The get()
function sends a GET request to the specified URL and returns a Response
object. This object contains the server's response to your request. You can get the content of the response with response.text
, and the HTTP status code with response.status_code
.
HTML Parsing
Once you have accessed the HTML content of the webpage, you can use it to extract the data you need. This is known as parsing. Python has several libraries for parsing HTML, including BeautifulSoup
and lxml
.
BeautifulSoup
is a Python library for parsing HTML and XML documents. It transforms a complex HTML document into a tree of Python objects, such as tags, navigable strings, or comments.
Here is an example of how to use BeautifulSoup
to parse HTML content:
In this example, BeautifulSoup(response.text, 'html.parser')
creates a BeautifulSoup
object and specifies the parser. soup.find('h1')
finds the first <h1>
tag in the HTML.
Handling Cookies and Sessions
In some cases, you may need to maintain a session between multiple requests to the same website. For example, you might need to log in to a website and then access a specific page that requires authentication. The requests
library provides a Session
object to handle this.
A Session
object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance.
Here is an example of how to use a Session
object to log in to a website and then access a protected page:
Working with Web APIs
Web APIs (Application Programming Interfaces) provide a way for applications to interact with each other. They expose parts of their service over the network, allowing other software to request data or perform actions.
You can interact with a Web API by sending HTTP requests, just like you do when you're scraping a webpage. The only difference is that, instead of getting HTML content in response, you get data in a machine-readable format, like JSON.
Here is an example of how to send a GET request to a Web API and parse the JSON response:
File Handling
In the process of penetration testing, you often need to read from files (like configurations, wordlists, etc.) or write results to files. Python has powerful built-in features for file handling, which include methods for creating, reading, updating, and deleting files.
Opening Files
In Python, you use the built-in open
function to open a file. The open
function takes two parameters: the name of the file, and the mode for opening the file.
Reading Files
Once you have opened a file for reading, you can use the read
, readline
, or readlines
method to read the file's content.
Note: Don't forget to close the file when you're done with it!!:
Writing Files
To write to a file, you open the file in write ('w') or append ('a') mode, and then use the write
method.
Keep in mind that opening a file in write mode will overwrite the existing content of the file. If you want to add to the existing content instead, open the file in append mode.
Working with JSON Files
JSON (JavaScript Object Notation) is a popular data format that is often used for communication between a server and a client, or between different parts of a single application. Python includes the json
module which allows you to read and write JSON data.
In the above example, json.dump(data, f)
writes JSON data to a file, and json.load(f)
reads JSON data from a file.
Error Handling
When working with files, errors can occur for many reasons, such as the file not existing, the user not having enough permissions, etc. It's important to handle these errors in your code to prevent your program from crashing.
Python provides the try/except
statement to catch and handle exceptions. Here is an example:
In this example, if opening the file fails because the file does not exist, Python raises a FileNotFoundError
exception, which is then caught and handled by the except
block.
Cryptography and Hashing
Cryptography is a fundamental part of cybersecurity and is essential for maintaining the confidentiality, integrity, and authenticity of data. It involves encoding, decoding, hashing, and password cracking which are key aspects of penetration testing.
Understanding Hashing
Hashing is a technique used to convert any data into a fixed size of unique data. The result of a hash function is called a hash value or simply, a hash. A good hash function ensures that the change of even a single bit of input will result in a significant change in the output.
Generating Hashes with Python
Python's hashlib
module provides a variety of hashing algorithms including md5
, sha1
, sha256
, and more. Here is an example of generating a hash with sha256
:
Working with Password Hashes
In many cases, especially during penetration testing, we come across hashed passwords. Python can be used to generate and compare password hashes. The bcrypt
library is a powerful, flexible library for hashing passwords. Here is an example:
Cryptography
Cryptography involves encrypting and decrypting data. Encryption transforms data into an unreadable format using an encryption algorithm and an encryption key. Decryption transforms the data back into its original format using the same encryption algorithm and a decryption key.
Symmetric Encryption and Decryption
In symmetric encryption, the same key is used for both encryption and decryption. Python provides several libraries for symmetric encryption, including cryptography
. Here's how to use it for AES (Advanced Encryption Standard) encryption and decryption:
Asymmetric Encryption and Decryption
In asymmetric encryption, also known as public key cryptography, two different keys are used for encryption and decryption. The cryptography
library also supports asymmetric encryption:
Python Libraries for Penetration Testing
Scapy
Scapy is a powerful Python library for packet manipulation. It allows you to forge or decode packets of a wide number of protocols, send them over the wire, capture them, and match requests and replies.
Here's an example of how to create an ICMP Echo request (a "ping") with Scapy:
Impacket
Impacket is a library for working with network protocols, which is highly effective when it comes to creating packet-level tools or working with network services. Impacket supports protocols like IP, TCP, UDP, ICMP, IGMP, ARP, and protocols used by higher-level services like SMB, MSRPC, and others.
Here's an example of using Impacket to connect to an SMB service:
Requests
Requests is a library for making HTTP requests. It abstracts the complexities of making requests behind a beautiful, simple API, so that you can focus on interacting with services and consuming data in your application.
Here's an example of how to use requests to make a GET request:
BeautifulSoup
BeautifulSoup is a Python library for parsing HTML and XML documents. It's often used for web scraping, which is a method of extracting data from websites.
Here's an example of how to use BeautifulSoup to extract all links from a webpage:
Building a Testing Tool with Python
Basic Network Scanner
By combining the Python concepts and libraries we've discussed so far, you can create your own powerful penetration testing tools. In this section, we will develop a simple yet effective network scanner as an example.
Tool Overview
Our network scanner will perform two main tasks:
Discover all the devices connected to the same network.
Scan the open ports of a given device.
For this, we will mainly use the Scapy library for packet generation and manipulation.
Writing the Network Scanner
Post-Exploitation with Python
Post-exploitation refers to the phase where an attacker (or penetration tester) has already gained access to a system and might need to maintain that access, escalate privileges, gather more information, or cover their tracks. For this section, we will discuss how Python can be used for such tasks.
Maintaining Access
Once a penetration tester gains access to a system, maintaining that access is crucial. Python provides several methods to accomplish this, such as creating backdoors.
A backdoor is a script that allows an attacker to bypass normal authentication methods. Please note that creating or using a backdoor is illegal and unethical without proper authorization. The following example is for educational purposes only:
Privilege Escalation
Privilege escalation involves gaining elevated access to resources that are typically protected from an application or user. There are two types of privilege escalation: horizontal and vertical. Horizontal escalation involves taking over another user's access rights, while vertical escalation involves elevating the privileges of the current user account.
Here is a simple script that checks if the current user has root privileges:
Information Gathering
Python is excellent for gathering more information from a compromised system. For instance, it can be used to list all directories and files, read specific files, fetch system and network information, and much more. Here's a simple script to fetch system information:
Covering Tracks
Covering tracks is an important part of post-exploitation. It involves deleting or altering logs that can indicate a system intrusion.
Here's a simple script that deletes a log file:
Buffer Overflow Vulnerabilities with Python
A buffer overflow happens when a program or process tries to store more data in a buffer (temporary data storage area) than it was intended to hold. Python can be used to create scripts that exploit these vulnerabilities in controlled and ethical hacking scenarios. In this section, we will provide a basic understanding of how to use Python to exploit buffer overflow vulnerabilities.
Understanding Buffer Overflow
Buffers are areas of memory set aside to hold data, often while moving it from one section of a program to another, or between programs. Buffer overflows can often be triggered by malformed inputs; if one assumes all inputs will be smaller than a certain size and the buffer is created to accommodate that, an anomalous transaction that produces more data could cause it to overflow. This can cause the data to leak into other buffers, which can corrupt or overwrite the data they were holding.
Building a Buffer Overflow Exploit
Here's an example of how you might create a Python script to exploit a buffer overflow vulnerability. The script will generate a long string of 'A's and send it to the target process. If the process does not correctly handle input of this length, it may overflow its buffer, causing a crash or other unexpected behavior.
Please note that all scripts are highly simplified. In a real-world situation these would involve more complex techniques and understanding of the target system and application.
Last updated