Supported Document Formats
This page lists document formats supported by Indexed I/O. Below the overview outline is more detailed information about each document.
Please note that Indexed I/O is able to detect a much wider range of formats than those listed below, this page only documents those formats from which Indexed I/O is able to extract metadata and/or textual content.
- Supported Document Formats
- HyperText Markup Language
- XML and derived formats
- Microsoft Office document formats
- OpenDocument Format
- iWorks document formats
- Portable Document Format
- Electronic Publication Format
- Rich Text Format
- Compression and packaging formats
- Text formats
- Feed and Syndication formats
- Help formats
- Audio formats
- Image formats
- Video formats
- Java class files and archives
- Source code
- Mail formats
- CAD formats
- Font formats
- Scientific formats
- Executable programs and libraries
- Crypto formats
- Database formats
HyperText Markup Language
The HyperText Markup Language (HTML) is the lingua franca of the web. The output is guaranteed to be well-formed and valid XHTML, and various heuristics are used to prevent things like inline scripts from cluttering the extracted text content.
XML and derived formats
The Extensible Markup Language (XML) format is a generic format that can be used for all kinds of content. Our default processing simply extracts the text content of the document and ignores any XML structure.
Microsoft Office document formats
Microsoft Office and some related applications produce documents in the generic OLE 2 Compound Document and Office Open XML (OOXML) formats. The older OLE 2 format was introduced in Microsoft Office version 97 and was the default format until Office version 2007 and the new XML-based OOXML format. Indexed I/O supports text and metadata extraction from both OLE2 and OOXML documents.
OpenDocument Format
The OpenDocument format (ODF) is used most notably as the default format of the OpenOffice.org office suite.
iWorks document formats
The various iWorks document formats (Numbers, Pages, Keynote).
Currently supported formats: Keynote format version 2.x. Currently only tested with Keynote version 5.x Pages format version 1.x. Currently only tested with Pages version 4.0.x
Numbers format version 1.x. Currently only tested with Numbers version 2.0.x
Portable Document Format
In addition to processing Portable Document Format (PDF) documents, Indexed I/O optically recognizes text from PDF documents that fail to extract text in real-time during the processing step of our default eDiscovery workflow.
Electronic Publication Format
Electronic Publication Format (EPUB) used for many digital books and xml-based Fiction Book publishing format.
Rich Text Format
RTF files are processed including extraction of text
Compression and packaging formats
Formats supported include Tar, RAR, AR, CPIO, Zip, 7Zip, Gzip, BZip2, XZ and Pack200 (does not include segmented archives).
Text formats
Extracting text content from plain text files seems like a simple task until you start thinking of all the possible character encodings. Indexed I/O automatically detects the character encoding of a text document.
Feed and Syndication formats
RSS and Atom feed syndication formats as well as IPTC ANPA News Wire feed formats
Help formats
CHM Help format.
Audio formats
Indexed I/O can detect several common audio formats and extract metadata from them. Even text extraction is supported for some audio files that contain lyrics or other textual content. Extracted metadata includes sampling rates, channels, format information, artists, titles etc. Indexed I/O also supports audio to searchable text transcription for some supported audio file formats.
Image formats
Processing extracts simple metadata from image formats such as PNG, GIF and BMP. More complex image metadata is available from Jpeg and Tiff images. Metadata extraction from PSD, BPG (Better Portable Graphics) and WebP image formats are supported.
When extracting from images, it is also possible to have OCR performed on the contents of the image.
Video formats
Indexed I/O supports the Flash video, the MP4 family of video formats (MP4, Quicktime, 3GPP etc) and the Ogg family of video formats. A limited amount of metadata is extracted Ogg video formats.
Java class files and archives
Indexed I/O can extract class names and method signatures from Java class files and jar archives. However this functionality is disabled by default.
Source code
Indexed I/O can handle a number of source code formats, including Java, C, C++ and Groovy. However this functionality is disabled by default.
Mail formats
Indexed I/O can extract email messages from the mbox, single email messages in the RFC 822 format used by many email clients in their archives / exports, Microsoft Outlook PST or OST (OST support for versions prior to 2013) email format, email messages from the Microsoft Outlook MSG email format, and email attachments from the Microsoft TNEF (Transport Neutral Encoding Format, aka Winmail.dat) used with some Microsoft email clients.
CAD formats
Indexed I/O can extract simple metadata from the DWG CAD format.
Font formats
Indexed I/O can extract simple metadata from the TrueType font format as well as Adobe Font Metrics files.
Scientific formats
Indexed I/O is able to extract attribute metadata from the GCMD Directory Interchange Format (DIF), GDA, ISO-19139 georgraphic information, Grib, HDF, ISA-Tab (ISA Tools) family, NetCDF, Matlab scientific file formats.
Executable programs and libraries
Indexed I/O can extract metadata information on platforms, architectures and types from a range of executable formats and libraries, such as Windows Executables and Linux / BSD programs and libraries.
Crypto formats
Indexed I/o is able to parse the contents of PKCS7 signed messages, but doesn't include any information from the outer PKCS7 wrapper.
Forensic Image Formats
Indexed I/O is able to ingest E01 and L01 forensic images, but they must be uploaded to a projects IIO Drive and mounted by our support staff to be processed. Reach out to Support@indexed.io for more information or for assistance loading these.
It also supports XML reports from Cellebrite, Axiom, Oxygen, and Paraben.
Zendesk Data
Indexed I/O can ingest archives from the zendesk support tool. Please reach out to Support@indexed.io for more information or assistance in ingesting this data type.
Full list of Supported Formats
|
Apple/MAC |
|
|
application/vnd.apple.iwork |
|
|
application/vnd.apple.numbers |
|
|
application/vnd.apple.keynote |
|
|
application/vnd.apple.pages |
|
|
Archive |
|
|
application/x-isatab |
|
|
application/zlib |
|
|
application/x-compress |
|
|
application/x-bzip |
|
|
application/x-java-pack200 |
|
|
application/x-bzip2 |
|
|
application/gzip |
|
|
application/x-gzip |
|
|
application/x-xz |
|
|
application/x-tar |
|
|
application/x-tika-unix-dump |
|
|
application/java-archive |
|
|
application/x-7z-compressed |
|
|
application/x-archive |
|
|
application/x-cpio |
|
|
application/zip |
|
|
application/x-rar-compressed |
|
|
Audio |
|
|
audio/x-wav |
|
|
audio/x-aiff |
|
|
audio/basic |
|
|
application/x-midi |
|
|
audio/midi |
|
|
audio/mpeg |
|
|
audio/x-oggflac |
|
|
audio/x-flac |
|
|
audio/x-oggpcm |
|
|
audio/ogg |
|
|
audio/opus |
|
|
audio/ogg; codecs=opus |
|
|
audio/speex |
|
|
audio/ogg; codecs=speex |
|
|
audio/vorbis |
|
|
Compiled HTML |
|
|
application/vnd.ms-htmlhelp |
|
|
application/chm |
|
|
application/x-chm |
|
|
Crypto Formats |
|
|
application/pkcs7-signature |
|
|
application/pkcs7-mime |
|
|
Database |
|
|
application/x-msaccess |
|
|
DWG |
|
|
image/vnd.dwg |
|
|
|
|
|
message/rfc822 |
|
|
application/mbox |
|
|
application/vnd.ms-outlook-pst |
|
|
application/x-tnef |
|
|
application/ms-tnef |
|
|
application/vnd.ms-tnef |
|
|
EPUB/FictionBook |
|
|
application/x-ibooks+zip |
|
|
application/epub+zip |
|
|
application/x-fictionbook+xml |
|
|
Executable |
|
|
application/x-elf |
|
|
application/x-sharedlib |
|
|
application/x-executable |
|
|
application/x-msdownload |
|
|
application/x-coredump |
|
|
application/x-object |
|
|
Feeds |
|
|
application/atom+xml |
|
|
application/rss+xml |
|
|
Fonts |
|
|
application/x-font-adobe-metric |
|
|
application/x-font-ttf |
|
|
Geo |
|
|
text/iso19139+xml |
|
|
application/x-grib2 |
|
|
HTML/WEB |
|
|
application/x-asp |
|
|
application/xhtml+xml |
|
|
application/vnd.wap.xhtml+xml |
|
|
text/html |
|
|
image/webp |
|
|
text/vnd.iptc.anpa |
|
|
Image |
|
|
image/x-ozi |
|
|
application/x-snodas |
|
|
application/x-ecrg-toc |
|
|
image/envisat |
|
|
application/x-doq2 |
|
|
application/x-rs2 |
|
|
application/x-gsag |
|
|
application/x-ers |
|
|
application/fits |
|
|
application/x-pnm |
|
|
image/adrg |
|
|
image/gif |
|
|
application/x-generic-bin |
|
|
application/x-bt |
|
|
application/x-zmap |
|
|
application/x-hdf |
|
|
image/eir |
|
|
application/x-ace2 |
|
|
application/grass-ascii-grid |
|
|
application/x-l1b |
|
|
application/x-gsc |
|
|
image/jp2 |
|
|
image/hfa |
|
|
image/fits |
|
|
image/raster |
|
|
application/x-epsilon |
|
|
image/x-srp |
|
|
application/x-envi-hdr |
|
|
application/x-ctable2 |
|
|
application/x-srtmhgt |
|
|
application/jaxa-pal-sar |
|
|
application/x-ndf |
|
|
application/sdts-raster |
|
|
application/x-gtx |
|
|
application/x-rst |
|
|
application/x-xyz |
|
|
application/terragen |
|
|
application/x-gs7bg |
|
|
image/arg |
|
|
application/elas |
|
|
image/big-gif |
|
|
application/x-geo-pdf |
|
|
application/x-ctg |
|
|
application/aaigrid |
|
|
application/x-lcp |
|
|
application/x-nwt-grc |
|
|
application/x-fast |
|
|
application/x-usgs-dem |
|
|
application/x-nwt-grd |
|
|
application/x-ingr |
|
|
application/x-envi |
|
|
application/x-rik |
|
|
application/x-blx |
|
|
application/x-wcs |
|
|
image/ceos |
|
|
application/x-ngs-geoid |
|
|
application/x-r |
|
|
image/bmp |
|
|
application/x-http |
|
|
application/x-til |
|
|
application/x-pds |
|
|
application/x-rasterlite |
|
|
application/x-gmt |
|
|
application/x-msgn |
|
|
image/ilwis |
|
|
application/aig |
|
|
application/x-rmf |
|
|
image/x-hdf5-image |
|
|
image/sar-ceos |
|
|
application/x-kro |
|
|
application/vrt |
|
|
application/x-netcdf |
|
|
image/nitf |
|
|
image/png |
|
|
image/geotiff |
|
|
image/x-mff2 |
|
|
application/x-webp |
|
|
image/ida |
|
|
application/x-gsbg |
|
|
application/x-ntv2 |
|
|
application/x-coasp |
|
|
application/x-los-las |
|
|
application/x-tsx |
|
|
application/x-bag |
|
|
image/fit |
|
|
application/x-lan |
|
|
application/x-map |
|
|
image/jpeg |
|
|
application/x-dods |
|
|
application/jdem |
|
|
application/gff |
|
|
application/x-isis2 |
|
|
application/x-isis3 |
|
|
application/xpm |
|
|
application/x-pcidsk |
|
|
application/x-gxf |
|
|
application/x-wms |
|
|
application/x-cosar |
|
|
image/bsb |
|
|
application/x-grib |
|
|
application/x-mbtiles |
|
|
application/x-cappi |
|
|
application/x-rpf-toc |
|
|
image/x-mff |
|
|
image/x-dimap |
|
|
image/x-pcraster |
|
|
application/x-ppi |
|
|
application/x-sdat |
|
|
application/pcisdk |
|
|
application/x-cpg |
|
|
application/leveller |
|
|
image/sgi |
|
|
image/x-fujibas |
|
|
image/x-airsar |
|
|
application/x-e00-grid |
|
|
application/x-kml |
|
|
application/x-p-aux |
|
|
application/x-doq1 |
|
|
application/dted |
|
|
application/x-dipex |
|
|
image/bpg |
|
|
image/x-bpg |
|
|
image/x-ms-bmp |
|
|
image/png |
|
|
image/x-icon |
|
|
image/vnd.wap.wbmp |
|
|
image/gif |
|
|
image/bmp |
|
|
image/x-xcf |
|
|
image/vnd.adobe.photoshop |
|
|
application/pdf |
|
|
Microsoft Office/Open Office |
|
|
application/x-mspublisher |
|
|
application/x-tika-msoffice |
|
|
application/vnd.ms-excel |
|
|
application/sldworks |
|
|
application/x-tika-msworks-spreadsheet |
|
|
application/vnd.ms-powerpoint |
|
|
application/x-tika-msoffice-embedded; format=ole10_native |
|
|
application/vnd.ms-project |
|
|
application/x-tika-ooxml-protected |
|
|
application/msword |
|
|
application/vnd.ms-outlook |
|
|
application/vnd.visio |
|
|
application/vnd.ms-excel.sheet.3 |
|
|
application/vnd.ms-excel.sheet.2 |
|
|
application/vnd.ms-excel.workspace.3 |
|
|
application/vnd.ms-excel.workspace.4 |
|
|
application/vnd.ms-excel.sheet.4 |
|
|
application/vnd.ms-excel.sheet.macroenabled.12 |
|
|
application/vnd.ms-powerpoint.presentation.macroenabled.12 |
|
|
application/vnd.openxmlformats-officedocument.spreadsheetml.template |
|
|
application/vnd.openxmlformats-officedocument.wordprocessingml.document |
|
|
application/vnd.openxmlformats-officedocument.presentationml.template |
|
|
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet |
|
|
application/vnd.openxmlformats-officedocument.presentationml.presentation |
|
|
application/vnd.ms-excel.addin.macroenabled.12 |
|
|
application/vnd.ms-word.document.macroenabled.12 |
|
|
application/vnd.ms-excel.template.macroenabled.12 |
|
|
application/vnd.openxmlformats-officedocument.wordprocessingml.template |
|
|
application/vnd.ms-powerpoint.slideshow.macroenabled.12 |
|
|
application/vnd.ms-powerpoint.addin.macroenabled.12 |
|
|
application/vnd.ms-word.template.macroenabled.12 |
|
|
application/x-tika-ooxml |
|
|
application/vnd.openxmlformats-officedocument.presentationml.slideshow |
|
|
application/x-vnd.oasis.opendocument.graphics-template |
|
|
application/vnd.sun.xml.writer |
|
|
application/x-vnd.oasis.opendocument.text |
|
|
application/x-vnd.oasis.opendocument.text-web |
|
|
application/x-vnd.oasis.opendocument.spreadsheet-template |
|
|
application/vnd.oasis.opendocument.formula-template |
|
|
application/vnd.oasis.opendocument.presentation |
|
|
application/vnd.oasis.opendocument.image-template |
|
|
application/x-vnd.oasis.opendocument.graphics |
|
|
application/vnd.oasis.opendocument.chart-template |
|
|
application/vnd.oasis.opendocument.presentation-template |
|
|
application/x-vnd.oasis.opendocument.image-template |
|
|
application/vnd.oasis.opendocument.formula |
|
|
application/x-vnd.oasis.opendocument.image |
|
|
application/vnd.oasis.opendocument.spreadsheet-template |
|
|
application/x-vnd.oasis.opendocument.chart-template |
|
|
application/x-vnd.oasis.opendocument.formula |
|
|
application/vnd.oasis.opendocument.spreadsheet |
|
|
application/vnd.oasis.opendocument.text-web |
|
|
application/vnd.oasis.opendocument.text-template |
|
|
application/vnd.oasis.opendocument.text |
|
|
application/x-vnd.oasis.opendocument.formula-template |
|
|
application/x-vnd.oasis.opendocument.spreadsheet |
|
|
application/x-vnd.oasis.opendocument.chart |
|
|
application/vnd.oasis.opendocument.text-master |
|
|
application/x-vnd.oasis.opendocument.text-master |
|
|
application/x-vnd.oasis.opendocument.text-template |
|
|
application/vnd.oasis.opendocument.graphics |
|
|
application/vnd.oasis.opendocument.graphics-template |
|
|
application/x-vnd.oasis.opendocument.presentation |
|
|
application/vnd.oasis.opendocument.image |
|
|
application/x-vnd.oasis.opendocument.presentation-template |
|
|
application/vnd.oasis.opendocument.chart |
|
|
application/rtf |
|
|
text/plain |
|
|
Scientific Data Formats |
|
|
application/x-matlab-data |
|
|
application/x-hdf |
|
|
Source Code/Software |
|
|
text/x-java-source |
|
|
text/x-c++src |
|
|
text/x-groovy |
|
|
application/x-netcdf |
|
|
application/java-vm |
|
|
Video |
|
|
video/mp4 |
|
|
video/avi |
|
|
video/mpeg |
|
|
video/x-msvideo |
|
|
video/3gpp2 |
|
|
video/mp4 |
|
|
video/quicktime |
|
|
audio/mp4 |
|
|
application/mp4 |
|
|
video/x-m4v |
|
|
video/3gpp |
|
|
video/x-flv |
|
|
video/x-oggyuv |
|
|
video/x-dirac |
|
|
video/x-ogm |
|
|
video/x-ogguvs |
|
|
video/theora |
|
|
video/x-oggrgb |
|
|
video/ogg |
|
|
XML |
|
|
application/xml |
|
|
image/svg+xml |
|
|
application/dif+xml |
Comments
0 comments
Please sign in to leave a comment.