Know What Claude Sees in Your File

After this, you'll be able to tell in advance whether a file will give Claude useful text or just a picture of text, and fix it before you upload.

Before you start

Before diving in, complete Share a Document with Claude so you have real upload experience to apply the format-quality diagnosis this lesson teaches.

The idea

You uploaded a spreadsheet. Claude gave you a confident-sounding answer that was completely wrong. This is one of the most common file-sharing failures, and it almost always comes from a format problem, not a Claude problem.

A file slab appears solid, but its inner content block is inaccessible to the model path. — The starting state for Know What Claude Sees in Your File.

Know What Claude Sees in Your File sequenceUse this model to move from the starting mistake to the lesson check.

NoticeFind the moment where this lesson applies.
Know What Claude Sees in Your FileApply the lesson move to one real task.
ProveMove on only when the check passes.

Here is the before and after: There are two kinds of PDFs, and they look identical in your file browser. A text-extractable PDF was created by exporting from Word, Google Docs, or a design tool. When you open it, the words are actual selectable text. Claude reads this perfectly. An image-only PDF was created by scanning a paper document without running text recognition. When you open it, the page looks like a photo. Claude sees a picture of words, not words it can read. The fix is to run OCR (optical character recognition) before uploading.

Spreadsheets have their own failure mode. A plain Excel or CSV file usually works fine. But if your spreadsheet has merged cells, color-coded categories with no text labels, or data organized in columns that depend on visual alignment, Claude loses that structure when it reads the file. What looks like a clean table to you becomes a scrambled list of numbers with no headers. The fix: before uploading a complex spreadsheet, add explicit text labels to every column and unmerge any merged cells.

Knowing which format to use before you upload saves the back-and-forth of getting a wrong answer and wondering why.

Try it (12 min)

Watch out for

Scanned PDFs without OCR: the most common silent failure. The file uploads fine, Claude responds confidently, but it is reading nothing. Always check that you can select text in a PDF before uploading it.
Excel files with merged cells: Claude flattens the table structure and loses the relationship between headers and data. Unmerge before uploading.
Color-coded spreadsheets where color is the only label: Claude cannot see colors. If red means 'over budget,' add the word 'over-budget' as a text label in that row.
Password-protected PDFs: Claude cannot read these at all. Remove the password protection first using your PDF viewer's security settings.
Very long documents where your question is about a small section: Claude handles long docs well, but a specific question with a page reference ('see page 14') is faster and more accurate than asking about a whole document.

Paste this into Claude

I am going to upload [describe your file: its type, where it came from, and roughly what it contains]. Before I do, I want to know: based on how this file was likely created, will Claude be able to read the text in it? What should I check or fix before uploading? If the file format might cause problems, tell me the best way to convert or prepare it so I get a reliable answer.

What a good response looks like

Based on your description, a few things to check before uploading your quarterly expense report.

If it was exported from Excel or Google Sheets as a PDF: it is almost certainly text-extractable and will work well. Test: try selecting text in the PDF with your cursor. If you can highlight individual words, Claude can read them.

If it was scanned from a printed report: it is likely image-only. Claude will see the page as a picture. To fix this, open the file in Google Drive, right-click it, and choose 'Open with Google Docs.' Google will run text recognition and convert it to a readable document in about 30 seconds.

For the spreadsheet version: the merged 'Q3 Summary' header row spanning columns A through D will likely lose its structure when Claude reads it. Before uploading, unmerge those cells and add the column label to each cell individually. Also, if your color coding (red for over-budget, green for under) carries meaning, add a text note to those rows since Claude cannot see colors.

Make those two changes and re-upload. You should get clean answers about your budget numbers.

What good looks like

You identified whether your file is text-extractable or image-only before uploading it
You uploaded the file and asked Claude what it can and cannot see in it
You found at least one format issue (merged cells, image-only scan, missing column headers) that would have caused a wrong answer
You fixed or worked around the issue and got a more reliable answer on the second attempt
You can explain in one sentence the difference between a text PDF and a scanned PDF

Go deeper (12 min)

Paste this into Claude

I am uploading a file that I suspect has a format problem: [describe the file and what you tried to do with it that gave a bad or confusing result]. Ask Claude to tell you exactly what it can see in the file. Say: 'Before you answer any questions about this file, tell me: what text can you actually read in it? Are there any sections that appear as images rather than text? Are there any tables or data structures that look garbled or incomplete?' Use that answer to decide whether to fix the file or ask your question differently.

What a good response looks like

Here is what I can see in the file you uploaded:

Pages 1-4: clean text, fully readable. I can see the contract terms, party names, and dates clearly.

Page 5 (the signature page): this appears to be a scanned image. I can see there are signatures and a date stamp, but I cannot read the handwritten date or confirm the exact text near the signature lines. If you need to verify the execution date, you will need to check page 5 directly in the original document.

The table on page 3 (Exhibit A, pricing schedule): the columns came through but the merged header row that spans all columns did not. I can see the individual numbers but I am not certain which column label applies to which set of figures. If you can tell me the column headers (e.g. 'Unit Price / Quantity / Total'), I can answer pricing questions accurately.

Everything else looks clean. What would you like to know about the contract?

What good looks like

You asked Claude to describe what it can see before asking your actual question
Claude identified at least one section it could not read or that looked garbled
You used that information to either fix the file or reframe your question
Your follow-up question got a more accurate answer than your first attempt

When this breaks

Breaks when meaning lives in visual structure (merged cells, color codes, alignment), because the model receives flattened text and the relationships between header and data disappear before any question is asked.
Breaks when the file is a picture of text rather than text itself, because no amount of question-tuning can recover words the model never received in the first place.

AI can help with this

After uploading any file, ask Claude before your real question: 'What can you actually see in this file? Flag anything that looks like an image, a garbled table, or missing headers.' That one check prevents most file-reading failures.

The slab separates into readable text and image-only zones before the model path begins.

You can now

Diagnose at least one format issue before uploading (image-only PDF, merged cells, color-only labels), apply a fix, and produce a reliable answer that the un-fixed version would have gotten wrong.

Key takeaways

Before you upload, ask: can I select text in this file? If not, Claude is reading a picture. Fix the format first, then ask your question.

Before you upload, check whether you can select text; if not, Claude is reading a picture
Merged cells, color-only labels, and visual alignment all flatten when the model reads the file
Run OCR on scanned PDFs and add explicit text labels to spreadsheets before uploading
When in doubt, ask Claude what it can see before you ask your real question

Go deeper

Was this helpful?

← Back to lessons

I am going to upload [describe your file: its type, where it came from, and roughly what it contains]. Before I do, I want to know: based on how this file was likely created, will Claude be able to read the text in it? What should I check or fix before uploading? If the file format might cause problems, tell me the best way to convert or prepare it so I get a reliable answer.

Based on your description, a few things to check before uploading your quarterly expense report. If it was exported from Excel or Google Sheets as a PDF: it is almost certainly text-extractable and will work well. Test: try selecting text in the PDF with your cursor. If you can highlight individual words, Claude can read them. If it was scanned from a printed report: it is likely image-only. Claude will see the page as a picture. To fix this, open the file in Google Drive, right-click it, and choose 'Open with Google Docs.' Google will run text recognition and convert it to a readable document in about 30 seconds. For the spreadsheet version: the merged 'Q3 Summary' header row spanning columns A through D will likely lose its structure when Claude reads it. Before uploading, unmerge those cells and add the column label to each cell individually. Also, if your color coding (red for over-budget, green for under) carries meaning, add a text note to those rows since Claude cannot see colors. Make those two changes and re-upload. You should get clean answers about your budget numbers.

I am uploading a file that I suspect has a format problem: [describe the file and what you tried to do with it that gave a bad or confusing result]. Ask Claude to tell you exactly what it can see in the file. Say: 'Before you answer any questions about this file, tell me: what text can you actually read in it? Are there any sections that appear as images rather than text? Are there any tables or data structures that look garbled or incomplete?' Use that answer to decide whether to fix the file or ask your question differently.

Here is what I can see in the file you uploaded: Pages 1-4: clean text, fully readable. I can see the contract terms, party names, and dates clearly. Page 5 (the signature page): this appears to be a scanned image. I can see there are signatures and a date stamp, but I cannot read the handwritten date or confirm the exact text near the signature lines. If you need to verify the execution date, you will need to check page 5 directly in the original document. The table on page 3 (Exhibit A, pricing schedule): the columns came through but the merged header row that spans all columns did not. I can see the individual numbers but I am not certain which column label applies to which set of figures. If you can tell me the column headers (e.g. 'Unit Price / Quantity / Total'), I can answer pricing questions accurately. Everything else looks clean. What would you like to know about the contract?