Contents

🛠️ OPSE Part 2: Developing an OSINT framework in Python

Introduction

In this article, we’ll explain OPSE, a tool that we developed during our school project to automate the OSINT process.

This article is part of a three-part series on the OPSE project. In this series of articles we will present different part of the project:

  1. Presentation of the tool w/ installation, usage and an example;
  2. A technical presentation of the tool, how it is thinked and coded;
  3. A guide to develop your own OPSE plugin !

Context

During our fourth year in engeneering school, we led a project on the OSINT theme. We asked ourselves how we could obtain as much personnal informations as possible on someone with few data at the beginning.

We started to imagine an automatic tool that, with few input data such as firstname and lastname, would be able to find many connections of this person on the internet. But we wanted it to be an help for anyone else in the future, so they can reuse it and modify it.

Our objectives were simple:

  • automatic;
  • fast;
  • modular;
  • open-source.

This project is now open-source and available on Github: https://github.com/OPSE-Developers/OPSE-Framework

discord logo You want to contribute on the OPSE project ? Join the Discord server by clicking on the logo !

⚠️ Disclaimer

OPSE project is made for educational & awareness use only !

Technical development

Core

The core is the most central part of the project. It links all the other parts, from the interface used to the different plugins called.

We have made it as modular as possible so that anyone can take it as a basis for their own project.

To do so, the core follows some principles from OOP (Object-Oriented Programming):

  • abstraction;
  • inheritance;
  • polymorphism;
  • etc.

Handling configuration

We needed to handle configurations. Not only from the core but also from the plugins, like if you want to limit a function or take a input or output path.

We started to develop our own configuration handler, that would manage YAML files and dictionnary like configurations.

Configurations handling are done by the Config class at utils/config/Config.py.

Parallelism with threads and asynchronous functions

We managed to parallelize the tasks we were going to launch. Indeed, when the script is launched, it performs several tens or hundred of requests on the internet, which can sometimes take a long time.

In order to reduce this execution time as much as possible, we decided to call each tool in a process in order to parallelize them all. Thus, it is not necessary to wait for the end of one of the tools to obtain the result of another.

To do so, let me introduce you the class Task. The class is an overlay of Thread and therefore inherits from it. The use of this class allows to normalize the use of Thread in the program. Classes Tool & Api inherit from Task.

Automatic requirements download

Because of the modularity with all the plugins and to improve the user experience, we added a piece of code executed at each launch to download all the requirements for the core but also for the plugins.

To realise it, we read the requirements.txt files from the Framework and from each plugins to check if every dependency is installed.

1
2
3
4
5
6
7
8
9
# utils/__init__.py

for pkg in required_packages:
    try:
        pkg_resources.require(pkg)
    except DistributionNotFound:
        lst_missing_packages.append(pkg)
    except VersionConflict:
        pass

Dependency version conflicts are a major issue between tools as we can’t do anything about their develpment. For now, this issue is bypassed by ignoring the tool causing the problem (the second one).

Among the list of missing packages, we also add pip if it is missing. We can then download all the missing parts and start using OPSE.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# utils/__init__.py

# === Install pip
if not pip_is_present:
    import urllib.request
    opener = urllib.request.build_opener()
    opener.addheaders = [('User-agent', 'Mozilla/5.0')]
    urllib.request.install_opener(opener)
    urllib.request.urlretrieve("https://bootstrap.pypa.io/get-pip.py", "get-pip.py")
    call(["python3", "get-pip.py"], stdout=DEVNULL, stderr=STDOUT)
    # Clean up now...
    os.remove("get-pip.py")

# === Install missing plugins
if lst_missing_packages:
    from pip._internal.cli.main import main as pipmain
    args = ['install', '-q', '--disable-pip-version-check', '--no-python-version-warning']
    args.extend(lst_missing_packages)
    pipmain(args)

Plugins

Plugins implementation

Thanks to the core structure, it is easy to implement new tools as components. The abstract class Tool allows all tools to be consistent with each other, while adding custom methods and attributes.

Each plugins is called by the main script (the core). Then, each enabled plugins are launched. But first, we do a check that verify that required data for a plugin are present before launching the tool.

Once the checks are done, the script performs it’s task and returns a boolean indicating if the operation was successful.

Plugins structure

Plugins have been totally separated from the core, which allows to improve the modularity of the framework.

All plugins are considered as projects in their own right, as modules. The core has been improved to manage this modularity, in particular the import of tools which is done automatically when they are detected. It is therefore possible to add as many tools as you want.

The core is also able to dynamically install the dependencies of each plugin as explained in the Automatic requirements download section.

Automatic imports

All the plugins are imported automatically.

To do that, we use the builtin libraries pkgutil and importlib to check if a specific directory contains packages.

1
2
3
4
5
6
7
8
# utils/utils.py

package = subclasses_package_name
if isinstance(package, str):
    package = importlib.import_module(package)

for _, sub_package, is_pkg in pkgutil.walk_packages(package.__path__):
    # [...]

In our case, we check if tools/ directory contains subpackages with subclasses of Tool.

1
2
3
# opse.py

Tool.lst_available_tools = import_subclasses(Tool, "tools")

Then, for each object in our directory, if this object is also a package, we can try to import each module it contains. As we want to create instances of those modules, we check if there is actually a correct subclass of our motherclass and then store it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# utils/utils.py

for _, module_name, _ in pkgutil.walk_packages([os.path.join(package.__path__[0], sub_package)]):
    full_name = package.__name__ + '.' + sub_package + '.' + module_name

    mod_type = importlib.import_module(full_name)
    module = getattr(mod_type, module_name + mother_class.__name__, None)

    if module is None:
        continue

    if issubclass(module, mother_class):
        lst_available_subclasses[full_name] = module

Finally, we can return the dictionary of successfully loaded modules and use it easily.

1
2
for tool in Tool.get_lst_active_tools().values():
    print(tool.get_name().lower)

Feedback

Looking back on mistakes made

One of the mistakes we made during the development of the project was a lack of unit testing. While we were able to deliver the first PoC on time, we encountered a number of bugs and issues that could have been avoided if we had implemented more comprehensive testing earlier in the process.

Another mistake is that we did not dedicate enough time to project planning and did not clearly define our goals and requirements at the beginning. As a result. This led to delays development process and therefore a lack of time to develop more plugins. That could have been avoided with better planning.

Future of this project

The second version of OPSE have been pushed recently ! We have done several improvements. OPSE architecture has been reviewed:

  • Review of imports;
  • Management of requirements:
    • Add a check on each requirements of OPSE, including loaded plugins;
    • Show the user any missing requirements (if any) and prompt them to install the missing ones.
  • Profile enrichment function has been reviewed and now works;
  • Data visibility feature has been removed.

After the release of OPSE v2.0.0 we will be focused on the development of plugins. For the moment, we have not developed a large number of plugins. But now, as the core is fully functional and modular it is “easy” for us to integrate new plugins.

We are also open to any contributions. We create a Discord server for people that want to contribute or give some feedback on the tool. You can also submit your own plugins on the OPSE discord server.

discord logo You can join the OPSE Discord server by clicking on the logo !

📖 In the next article…

In this article, we describe how OPSE has been designed. This article was the technical part of the series and you now know how OPSE works.

In the next article, we’ll explain how developer can create their own plugins to be added to the OPSE core.

see: 🛠️ OPSE Part 3: Create your own plugin