AI-powered research has seven structural weaknesses: choosing information sources, constructing search queries, selecting search engines, fetching web pages, geo-blocking, reading PDFs, and managing tokens. Here’s what I’ve noticed from conducting overseas company investigations.

I once asked ChatGPT to “list 20 food manufacturers in Thailand.” What I got back was a list of well-known companies with English-language websites.

“I could have gotten the same results from a Google search myself,” I thought.

Overseas company research is my job. Using AI as a tool while conducting investigations, I noticed that AI hits the same walls every time, in the same predictable pattern. Let me walk through the seven barriers I’ve encountered.


There Are Barriers at All Seven Stages

When you hand research to AI, problems surface at all seven stages – from choosing information sources to managing tokens.

# Research Stage Where AI Stumbles
1 Where to look (source selection) Only does web searches. Can’t predict “this data lives in that database”
2 Search query construction Doesn’t use AND/OR or phrase search. Doesn’t search in local languages
3 Search engine selection Doesn’t use Google. Disadvantaged for Asian languages
4 Web page retrieval Can’t read JavaScript-rendered pages. Can’t operate database interfaces
5 IP-based geo-blocking Servers are in the US and can’t access locally restricted sites
6 Document reading Reads every page from top to bottom. Can’t skim
7 Token management Poor allocation – stops right when it matters

Let me go through each one.


Barrier 1 – AI Doesn’t Know Where the Data Lives

AI starts with a web search no matter what you ask, but much of the data needed for overseas company research can’t be found through web search.

This was the first barrier I noticed.

When asked to “list food manufacturers in Thailand,” I used to start with Google myself. Then one day I discovered that the Thai Department of Industrial Works (DIW) maintains a factory registration database. Search by industry classification code, and you get the complete list of registered factories. The difference was eye-opening.

What AI Does What I Usually Do
First action Web search for “Thailand food manufacturer” Predict “Thai food factory data should be in the DIW database”
Information source Google search results DIW factory database, searched by industry classification code
What you get 10-20 well-known companies with English websites Complete list of registered factories matching the code
What you miss Every company without an English website Registration-based, so coverage is comprehensive

The same pattern shows up in other investigations. “If a product requires certification, the certification database has a manufacturer list.” “US customs data is searchable on ImportYeti by HS code.” This ability to predict where data lives came from conducting hundreds of investigations.

Here’s what happened when I actually asked ChatGPT and Perplexity.

ChatGPT's response. Results are dominated by well-known companies found through web search.

Perplexity's response. Source links reveal all information comes from English-language websites. Not a single Thai government database is referenced.

Meanwhile, accessing the DIW database directly with an industry classification code returns data like this.

(Sample entry – Ban Bung Dairy Cooperative, 24/15 Non Chak, Ban Bung District, Chonburi Province. Manufacture of pasteurized butter, processed milk, and yogurt. Machinery: 176.50 HP, Capital: 11.5 million baht, 10 employees.)

Thai Department of Industrial Works (DIW) search results. Shows factory registration numbers, addresses, industry codes, equipment capacity, and employee counts.


Barrier 2 – Search Queries Are Sloppy

AI tends to generate simple English keyword strings as search queries. It doesn’t use AND/OR combinations or phrase search, and it doesn’t search in local languages.

When you ask AI to “research food manufacturers in Thailand,” it starts searching with simple queries like “Thailand food manufacturer” or “Thai food company list.” I would take a more deliberate approach.

AI’s Search Queries Queries I Build
Language English only Search in Thai (“โรงงานอาหาร” = food factory)
Structure String of keywords Combine conditions with AND/OR
Filtering None Filter by province, industrial estate, industry code
Phrase search Not used Wrap in “” for exact match

For example, searching in Thai for “โรงงานอาหาร” surfaces a large number of local small and medium manufacturers that don’t appear in English-language results. Add a province name – “โรงงานอาหาร ชลบุรี” – and you narrow it to food factories in Chonburi Province.

The quality of your search query changes the number of companies you find by orders of magnitude. AI doesn’t seem to have learned this craft well, and defaults to generic English keywords.


Barrier 3 – The Search Engine Isn’t Google

Most AI tools use a search engine other than Google internally. For Japanese and Asian languages, Google performs significantly better, putting AI-powered search at a disadvantage.

Even if you’ve constructed a local-language query as described in Barrier 2, the engine processing it may be the bottleneck. For English, most search engines deliver comparable results. But for Japanese, Thai, and Vietnamese, the differences become significant.

Consider searching in Thai for “โรงงานอาหาร ชลบุรี” (food factories in Chonburi Province).

Search Language Google Search AI’s Built-in Engine
English Comparable results Comparable results
Japanese Government and industry sites rank high Accuracy varies
Thai DIW database and local companies rank high English sites tend to mix in

ChatGPT, Perplexity, and Copilot use non-Google search engines (Gemini being the exception). Users have no visibility into which engine their AI tool uses. When Asian companies “somehow don’t show up,” the search engine itself is sometimes the reason.


Barrier 4 – AI Can’t Operate Databases

AI can read static web pages, but it can’t fill in forms or fetch JavaScript-rendered pages. Most information sources used in overseas company research hit this barrier.

Here are examples of sources that AI can’t read even when given the URL.

  • Thai Ministry of Commerce (DBD) – Enter company name in Thai → download financial data

  • UL Product iQ – Enter category code to find certified companies

  • National factory registration databases – Specify industry code and region to list factories

All require entering conditions into a browser form to retrieve data.

Thai Department of Industrial Works (DIW) factory search form. Requires entering industry codes and province names in Thai.

The fact that data “exists” on the web and the fact that AI can “access” it are two different things.


Barrier 5 – Some Sites Can’t Be Opened from US Servers

ChatGPT, Claude, Perplexity, and other AI tools most likely access the web from servers in the United States. Sites that are only available within the local country are simply unreachable.

Some Thai and Indonesian government websites apply IP-based geo-blocking, allowing access only from domestic IP addresses.

  • Thai Department of Industrial Works (DIW) – Some search functions are restricted to Thai IPs

  • Indonesian company registration databases – Some pages are domestic-access only

  • Chinese corporate credit information sites – CAPTCHAs and connection limits are imposed on overseas access

I live in Thailand, so Thai government sites work fine for me. But as long as AI servers are located in the US, data from these sites is out of reach. AI tools don’t have VPN capabilities to work around these restrictions.


Barrier 6 – AI Can’t Skim a PDF

When you give AI a PDF, it starts reading from page 1. It can’t scan the table of contents and jump to the relevant chapter.

The PDFs I encounter in overseas research commonly exceed 100 pages. Compare how I read versus how AI reads.

  • How I read – Check the table of contents → open only the relevant chapters → pull out just the tables and charts

  • How AI reads – Process every page from the beginning → hit the token limit around page 20-30

The data tables in the latter half are never reached. Reading every page is something even I almost never do.


Barrier 7 – Poor Token Allocation Stops Research Mid-Stream

AI has a hard limit on how much information it can process at once (the token limit). The problem isn’t the limit itself – it’s that AI is bad at allocating tokens.

I can glance at 50 search results and decide “only these 5 are worth reading.” AI struggles with this triage, reading results one by one in order. Around the 10th to 20th result, it hits the token limit.

Right when it’s getting close to the core information, it stops and asks “shall I continue?” Deciding where to concentrate tokens is something AI still finds difficult. You asked it to investigate, but it stopped halfway. This is one root cause of the feeling that “AI research is shallow.”


The Core Issue Behind All Seven Barriers Is Research Design

What AI lacks isn’t processing power – it’s the judgment to decide “where to look next.”

The “deep research” features being marketed by AI companies are evolving toward more searches and more pages read. But I think this is just spreading wider, not going deeper.

True depth means following one lead to the next, drilling down through multiple layers.

  • Discover a company name

  • → Check certification DB for model numbers

  • → Identify the OEM parent

  • → Verify factory operations in local language

This judgment of “where to dig next” is what determines research depth. I leverage AI’s processing speed and multilingual comprehension while keeping a human in charge of source selection, search engine choice, and database access. A tool is only as good as the person using it. That’s what trial and error have taught me.


What Gets Missed When You Treat AI Search Results as “Research Complete”

Asking AI to “look this up” is natural. But treating the results as a finished investigation means gaps will remain.

# Barrier What Happens as a Result
1 Source selection Every company not on the open web is missed
2 Search query construction Information in local languages is entirely missed
3 Search engine Search accuracy drops for Asian languages
4 Page retrieval constraints Government and certification database data can’t be extracted
5 Geo-blocking Locally restricted sites are inaccessible
6 Document reading 100-page reports can’t be read through
7 Token allocation Research stops right when it matters

AI has the power to gather information broadly. Add “which databases to use,” “which languages to search in,” and “how to interpret the data” – and the depth of your research changes.


About the Author

Takashi Kinoshita – CEO, Taitonmai Co., Ltd.

  • Graduate degree from a national university

  • 8 years in procurement at a major Japanese electronics manufacturer

  • Including 2 years stationed at the company’s Thailand factory as Procurement Section Manager, managing local staff as the sole Japanese manager

  • Experienced procurement operations in English, Chinese, and Thai

  • After founding Taitonmai, has conducted corporate investigations spanning 80+ countries and 10,000+ companies


https://taitonmai.co.jp/en/column/20260216_02/