Tools - Scan Raw Binary / Text File

Raster GIS data is sometimes provided in "raw" file formats where the importing application, in our case Manifold, must organize the data into usable form based on guidance provided in some accessory file.    The data is said to be raw because the file itself provides no organizing structure, only a sequence of bytes that are up to the importing application to interpret.   If the accessory file or other accompanying documentation is lost, it can be very difficult to guess how the raw file should be interpreted.  

 

Getting Data from a Raw File

 

 

Raw Files are Binary or Text

Raw data files are usually one of two generic types:  Raw binary files contain bytes that are to be interpreted as some numeric data type, such as integers or floating point types.   Raw text files contain bytes that are to be interpreted as some form of text, where the text characters are then in turn interpreted as numbers or other information.    

 

Manifold provides two tools, one for raw binary files and one for raw text files, which together provide generic import capability from a wide variety of different arrangements used in raw binary or raw text files: 

 

Two Step Process

Manifold Scan Raw tools are used in a two step process:  

 

 

First, we first scan the raw file to verify our intended interpretation is correct, and then, second, we use a configuration file created by that scan to import the data.  The configuration file is in human-readable JSON format.

 

Data within raw files can be arranged in various ways, which usually can be described by a relatively small set of options such as the data type, horizontal and vertical size of the grid of pixels and the number of data channels.  We learn what options we must use for a particular file by reading documentation that accompanies the raw file.    If that documentation provides clear information using sensible terminology, we usually can load a raw file on the first try.

 

Unfortunately, documentation describing the data within the raw file can use idiosyncratic and nonstandard terms to describe the organization of the raw file or the documentation might fail to provide important information such as data type.   In such cases we must try out various possibilities to see what works.   Because raw files can be very large and thus it can be inconvenient to apply trial and error to import the entire file, Manifold provides Scan Raw tools which can quickly scan a raw file using a given set of options to see if they work.  If the result is obviously wrong, we can adjust the options and try again.  

 

If the result is OK, we can command the tool to create a configuration file in rwb format which captures necessary options to interpret data from a particular file.  We then use that configuration file to link the data into Manifold.   The two step process gives us the option of trying out various options before committing to what might be a lengthy import.

Lines / Rows / Height  and Columns / Length / Width

Rasters are always rectangular arrangements of pixels, within which pixels are arranged in a sequence of rows, with all the rows having the same number of pixels.   Rows might be referred to as lines in some metadata documents, with the number of rows being called the height.   The number of pixels in a row might be called the length of each row, or the number of columns in that row, or the width of the raster.   Data in a raw file is just one long sequence of bytes from beginning to end of the file.  

 

If the raster image is 800 pixels wide by 600 pixels high, the data might be organized so the first 800 bytes are the data for the first row, the next 800 bytes are the data for the second row, and so on.    Raw files do not normally contain information within the file on the width and height of the raster image.  Instead, we must find that information from any accompanying documentation.   If we do not know that the file contains data for an image that is 800 pixels wide by 600 pixels high with one byte per pixel, we will not be able to tell Manifold to use the first 800 bytes for the first row, the next 800 bytes for the next row and so on.   

 

dlg_scan_raw_binary.png

 

Scan Raw Binary File

Data File

Name of the raw binary file.  Press the [...] browse button to navigate to the desired folder and to choose the desired file.

Scan File

Name automatically constructed by appending .rwb to the name of the raw binary file.   Specify a different name if desired.

Skip bytes

The number of bytes in the beginning of the file to ignore.  Use to skip over header and other non-data information sometimes found in raw files.

Padding bytes

The number of bytes to skip after each line.

Null value

The numeric value used to represent "no data" in the file for that pixel. For example, the number -9999 is often used to indicate no data for a pixel.

Type

Choose the data type represented by the binary data in the file.   The accompanying format box allows choice of Intel (little-endian) or Motorola (big endian) style encodings.  

Size

The horizontal (East/West) and vertical (North/South) dimensions of the image in pixels.   Specified as [width, height].   Some people prefer to think of this as [x, y] or as [ (number of columns), (number of rows) ], all of which are the same numbers.

Channels

The number of channels in the file.   The accompanying box  is enabled when the number of channels is greater than 1, and specifies the interleaving, that is, channel order, within the file as follows:

 

  • Interleaved by band - The file contains all samples for channel 1 followed by all for channel 2 and so on.

  • Interleaved by line - The file  contains all samples for line 1 of channel 1 followed by all samples for line 1 of channel 2 and so on.

  • Interleaved by pixel - The file  contains samples from all channels for pixel 1 of line 1 followed by samples from all channels for pixel 2 of line 1 and so on.

Scan

Scan the data file using specified options.

Save

Enabled after a scan.   Save the named rwb configuration file based on the specified options.

 

Scan Raw Text File

 

Data File

Name of the raw text file.  Press the [...] browse button to navigate to the desired folder and to choose the desired file.

Scan File

Name automatically constructed by appending .rwb to the name of the raw binary file.   Specify a different name if desired.

Skip lines

The number of lines in the beginning of the file to ignore.  Use to skip over titles, comments, header and other non-data information sometimes found in raw files.

Delimiter

Enter characters (more than one is allowed) to be interpreted as delimiters. White space characters such as space, tab, and end-of-line, are always considered delimiters.

Null value

The text string used to represent "no data" in the file for that pixel. For example, the text -9999 is often used to indicate no data for a pixel.

Type

Choose the data type represented by numbers in the file.  Text representations of numbers have no "Intel" or "Motorla" format, since they are read by Manifold as text.

Size

The horizontal (East/West) and vertical (North/South) dimensions of the image in pixels.   Specified as [width, height].   Some people prefer to think of this as [x, y] or as [ (number of columns), (number of rows) ], all of which are the same numbers.

Channels

The number of channels in the file.   The accompanying box  is enabled when the number of channels is greater than 1, and specifies the interleaving, that is, channel order, within the file as follows:

 

  • Interleaved by band - The file contains all samples for channel 1 followed by all for channel 2 and so on.

  • Interleaved by line - The file  contains all samples for line 1 of channel 1 followed by all samples for line 1 of channel 2 and so on.

  • Interleaved by pixel - The file  contains samples from all channels for pixel 1 of line 1 followed by samples from all channels for pixel 2 of line 1 and so on.

Scan

Scan the data file using specified options.

Save

Enabled after a scan.   Save the named rwb configuration file based on the specified options.

 

 

For a step-by-step example of use, see the Example: Link NLCD using Scan Raw Binary File topic.

Notes

Encodings: The raw binary importer in Release 8 offered eight additional choices of floating point encodings for floating point values in addition to Intel (called IEEE Intel) and Motorola (called IBM MVS), the various additional encodings being for such ancient machines, such as Gould or Data General minicomputers, that sample data for such machines can no longer be found.  Manifold therefore in modern times offers a choice of either Intel or Motorola style floating point formats.  

See Also

File - Import

 

File - Link

 

File - Create - New Data Source

 

Assign Initial Coordinate System

 

Example: Link NLCD using Scan Raw Binary File - Use the Scan Raw Binary File tool to scan and to prepare a configuration file, which we use to link an NLCD raw binary file providing land cover data for Delaware as a raster image.   We use a standard palette to color the land cover data and then we assign a projection to the newly imported image so it can be used as a correctly georegistered layer in maps.