Enterprise Intelligence: A Decision-Making Architecture with OSINT and Data Fusion
The practical architecture for turning open-source intelligence (OSINT) data into a decision-support layer through an enterprise data-fusion platform. An engineering note from eCloud Tech.
When an organisation decides on something, more than half of the information it needs is already inside its own systems. The other half sits in open external sources: public records, social media, sector news, dark web leaks, third-party vendor data. The problem is not access to information but the ability to combine the two sets to answer the same question. What our cyber-intelligence team has observed over the last three years: enterprise decisions are usually delayed not by "lack of data" but by "data fragmentation".
This article explains how OSINT and data fusion work together to change an organisation's decision-making layer. Technical architecture, legal limits and operational practice in one piece. We share seven practices our engineering team has learned while deploying Palantir-style platforms.
1. The limit between the two disciplines and where they meet
OSINT (Open Source Intelligence) describes a type of data source — structured information collected lawfully from publicly accessible digital traces. The content includes public records (commercial registers, court decisions, tax data), open social-media posts, news sites, dark web forums (within legal limits), company websites, academic articles and external data providers.
Data fusion, by contrast, is the discipline of merging data from different sources into a single connection graph. The sources may be OSINT, your corporate CRM, your SOC logs, or external threat-intelligence feeds. Data fusion's value lies not in collecting new data but in modelling existing data so it can answer new questions.
The two disciplines meet here: you integrate the raw OSINT data you've collected with your corporate data through a data-fusion platform. OSINT alone stays as a PDF report; data fusion alone is trapped inside and cannot see out. Combined, questions like "This new prospect was associated with which lawsuit in the past six months, who on the board followed which competitor on Twitter, is the BIST share movement consistent with our latest reports?" are answered in a single click.
2. Data-source mapping — the first week
The most critical task at the start of every data-fusion project is producing a source map. This is not a "we use these systems" list; it is a detailed inventory of which entities each system holds, which fields are matchable, how often it updates.
A typical enterprise map includes:
- Corporate systems: CRM (customer, contact, opportunity), ERP (order, invoice, stock), HR system (employee, position), helpdesk (ticket, resolution).
- Operational data: SOC logs, network telemetry, application logs, audit trail.
- OSINT feeds: commercial-register APIs, BIST / financial data providers, dark-web monitoring services, social-media monitoring (within legal limits), KVKK breach-notification page.
- Document archive: contracts, legal correspondence, audit reports, customer call notes.
The most common surprise during mapping is that the same entity is held with different IDs across different systems. The same customer appears as "Acme A.Ş." in CRM, "ACME ANONIM SIRKETI" in ERP and "acme" in the helpdesk. Resolving this is a separate engineering problem called entity resolution.
A practical example from two recent projects: in the map we prepared for a financial institution, nine different systems were listed; after two weeks of deep scanning we discovered there were fourteen. The five missing systems were small departmental tools (Excel macro templates, shared SharePoint folders, an old Access database, a vendor portal, an email filter rule). They were missing from the external list because corporate IT treated them as "shadow IT". Practical rule: mapping always starts incomplete and is finished by observing teams' daily workflows. Projects that rely only on the IT inventory from the CIO office hit a "data source we didn't know about" surprise in production in 40-60% of cases.
3. Entity resolution — the spine of the graph model
Entity resolution is the process of merging different records belonging to the same entity across different systems. Three techniques are used together:
Deterministic matching: pairing on unique identifiers such as tax number, MERSIS number, national ID number. Most reliable; 99%+ accuracy. Limit: these identifiers may not be held in every system.
Probabilistic matching: name similarity (Levenshtein distance, phonetic algorithms), phone/email normalisation, address parsing. 85-95% accuracy; matches above a threshold are merged automatically, below the threshold they fall to a human analyst.
Contextual matching: two people who attended the same meeting, two accounts transacting from the same IP, two companies that signed the same document. Derived from neighbourhood in the graph; automated by AIGENCY V4's AI agents.
The unified entity graph that emerges from combining all three is the foundation for every later query. We dedicate the first two weeks of every project to building this graph correctly; if it is wrong, every subsequent analysis works on flawed data.
4. KVKK compliance — architectural decision, not a bolt-on
OSINT and data-fusion projects cannot add KVKK compliance as a final layer. Three layers must be designed in together from day one:
Legal-basis layer: Which data category (personal, sensitive, anonymous) is being processed under which legal basis? Every data flow is documented within KVKK Article 5 (explicit consent), Article 6 (additional conditions for special categories) and Article 9 (cross-border transfer). The "we can process it because it's public" view is wrong — KVKK does not contain this exception; a legitimate-interest balance must be performed.
Anonymisation layer: Where full identity is not necessary for decision-making, entity IDs are hashed; raw data remains visible only to authorised analysts, and dashboards show aggregated / anonymised data. AIGENCY V4's encrypted memory layer is used to keep processing inside Türkiye.
Audit-trail layer: Every query produces a log entry showing which role accessed which entity for what reason. The log is kept append-only and unchangeable. A dump must be producible within 24 hours upon a KVKK audit request.
Designing these three layers together holds the compliance cost to 5-10% of total project effort. Adding them later means 30-50% additional development time for the same compliance posture; our enterprise AI-platform deployment service includes this architecture as standard.
5. Authorisation — who sees which node
The most frequently neglected but most critical feature of a data-fusion platform is role-based authorisation. A single database + single dashboard works for small teams; in medium-to-large organisations it creates a core security problem.
In a permissioned graph architecture, every node (entity) and every edge (relationship) carries its own authorisation metadata. Example scenarios:
- Marketing analyst: sees customer name + sector + sales history; financial-soundness score is hidden.
- Legal team: sees full contract text + risk flags; CRM communication notes are hidden.
- Executive leadership: sees synthesised reports + trend analyses; NO access to raw data.
- SOC team: sees all system logs + threat intelligence; NO access to customer personal data.
Which role sees which area is decided with the business owner, not unilaterally by engineering. Our practice: at project kickoff a "role matrix" is produced; for each column (role) × row (entity type), an access level (none / aggregate / detail) is set. No code is written before this matrix exists.
6. AIGENCY V4 integration — natural-language queries
The real value of a data-fusion graph depends on enabling non-analyst users to ask questions and receive answers. In the classical approach every query needs a data analyst who knows SQL/Cypher; this is both a bottleneck and a high operational cost.
AIGENCY V4's 8-agent architecture removes that bottleneck. The user asks in natural language: "Which of our cases involved Acme A.Ş. in the past six months, and which board members are connected to the people named in those processes?"
What the system does:
- Coordinator agent decomposes the question into sub-tasks.
- Researcher agent writes Cypher queries against the graph database (on Neo4j) or calls the GraphQL API.
- Reviewer agent verifies the results pass the authorisation layer; if not, returns a partial answer.
- Coder agent transforms results into the user's preferred format (table, chart, synthesis).
- Synthesis agent writes the answer in natural language, with source-node references (evidence chain).
This architecture ships with our enterprise AI-platform deployment service. A trained analyst answers 3-5 queries per hour; an AIGENCY V4-supported system answers 30-50 — the analyst is consulted only on uncertainty cases.
Natural-language querying has two additional benefits. First, formulating a question does not require the user to learn the data model; an analyst may say "customer segment" while marketing says "customer cohort", and the architecture resolves both to the same entity. Second, user queries are captured over time as usage patterns; materialised views can be produced for the 50 most common queries, so the system learns to be performant where it is actually used. This is the fundamental difference from the static dashboard approach of classical BI tools.
The limit of natural-language querying is this: on ambiguous or contradictory questions, the model applies its own interpretation. That is why the reviewer agent is critical — the parse tree of the query is shown to the user with a "did you mean to ask this?" confirmation. Systems that skip this step risk presenting a wrong answer as correct; in AIGENCY V4 this step is mandatory.
7. Typical setup mistakes and what we learned
The recurring mistakes from the 12 data-fusion projects we have delivered over the past three years (and the process changes we adopted in response):
Mistake 1: Deciding the data model too late. On our first two projects we refined the graph schema after the ETL; result: we wrote pipelines twice. Fix: The graph schema must be frozen at the end of Phase 1 (1-2 weeks); all subsequent pipelines write to that schema.
Mistake 2: Using a single threshold for entity resolution. A single 0.85 similarity threshold either over-merges (false positives) or under-merges (false negatives). Fix: We use two thresholds — above 0.95 auto-merge, 0.75-0.95 falls to a human analyst, below is rejected.
Mistake 3: Leaving KVKK audit questions to the end. On one project we missed the audit log dump requirement from the KVKK Data Controller guideline at the end, costing three weeks of refactor. Fix: Audit logging is now built in Sprint 0; it is the first implementation, not the last.
Mistake 4: Offering too many user options. On an early project we gave the analyst 20+ filters + 15+ visualisations; no one used them all, and documentation ballooned. Fix: Three to five "golden scenarios" are defined first; all UI is optimised for those.
Mistake 5: Not writing a backup + disaster-recovery plan. We realised how critical the growing graph database had become only after a disk failure two years in. Fix: Every project now includes daily snapshots + cross-geography replication (Şanlıurfa + Düsseldorf) + a 4-hour RPO/RTO guarantee.
Mistake 6: Not load-testing at real data scale. A query that worked perfectly on 10,000 nodes in the pilot environment dropped to 90 seconds in production with 5 million nodes. Fix: Even at the pilot stage, stress tests with a synthetic data generator at production scale are mandatory. We don't go live until the p95 query time is below 2 seconds.
Mistake 7: Leaving user training to the last week. Technical excellence alone does not drive adoption. On a previous project the system was delivered, usage stayed at 15% six weeks later, and users went back to their old Excel reports. Fix: From the pilot phase onwards, end-users are involved in weekly workshops; training begins on day one, not on handover day. This practice has lifted adoption to 85%+.
The fix for these seven mistakes is now part of our standard process; it applies to new projects from the start. In the final week of a project, one engineer runs a "what could go wrong" checklist against these seven headings — a last defensive layer before delivery.
Decision matrix: is it right for your organisation, when isn't it
Data fusion is not the right solution for every organisation. Three questions give you a fast read:
| Question | Yes → fusion makes sense | No → simpler solution |
|---|---|---|
| Is information about the same entity held in 3+ different systems? | ✓ | A single-system dashboard suffices |
| Is decision time currently measured in "days" (spent searching)? | ✓ | Without urgency, fusion ROI is low |
| Do KVKK / regulatory audit requests arrive frequently? | ✓ | Simple log reporting may suffice |
| Do your data sources sit with 5+ different owner teams? | ✓ | Graph is not mandatory for single-owner data |
Three or more "Yes" answers mean data fusion makes sense. With two or fewer, evaluate simpler data-warehouse / BI options first; you can move to fusion as scale grows.
Our open-source intelligence service and data-fusion architecture service are available as two separate packages or combined. Our typical enterprise customers choose both — because OSINT's value multiplies when combined with corporate data.
Our pilot-project approach
For an organisation just starting, we recommend beginning with a 3-week pilot. The pilot scope is:
- A single decision scenario is selected (e.g. "Risk evaluation of a new prospect must drop from 1 hour to 5 minutes").
- Two OSINT sources + two internal systems are integrated.
- A mini graph (5-10 entity types) + one analyst interface is delivered.
- A demo + ROI evaluation happens at the end of three weeks.
Two paths follow: either expansion (additional sources, scenarios, enterprise scale) or pivoting to a simpler BI solution. Both directions are chosen on evidence; our consulting process makes that call with you, not for you.
If you want to evaluate your own OSINT + data-fusion need, you can request a preliminary call via our contact page. In a free 60-minute call we walk through how the seven items apply to your specific scenario and propose a pilot scope.
Feel free to share this article with peers in your sector considering a similar build; concrete practical content in this domain remains scarce and valuable. Our team will announce new posts on our blog — next up: "entity resolution with permissioned blockchain" and "multi-agent analyst workflows with AIGENCY V4". If these topics matter to you, mentioning them on your enquiry lets us share the relevant technical material before the call.