M3 File Formats

File Content – Representing the information on a computer

Information is stored in circuits which are in in one of two states. In RAM the two states are a low voltage, and a higher voltage, with a cutoff in between. Disk drives typically store information in terms of magnetic polarity. One state may be thought of as 0, the other state may be thought of as 1. Binary notation, a numbering system consisting of 0’s and 1’s,  is a good fit to describe this information.

Binary and Hex

We will become familiar with two commonly used numbering systems in data processing: Binary (base 2) and Hex (base 16).

Here’s what you need to know:

  • Why would you want to use another numbering system?
  • How to convert a number from one base to another.

A good video of this is here:

http://www.youtube.com/watch?v=5sS7w-CMHkU&feature=related

A text version is here:

http://www.codeproject.com/Articles/4069/Learning-Binary-and-Hexadecimal

The programmer’s editor I used in class is free and is available here: (you don’t need this for the class)

http://www.pspad.com/

Analog vs. Digital

When you make an analogy, you say that one linkage, say hands and gloves, is like another linkage, feet and socks. Gloves go on hands to keep them warm, socks go on feet to keep them warm. Gloves are to hands as socks are to feet. The analogy holds.

An analog signal is a signal which is similar to the thing that it is measuring. Take a simple analog device like a mercury thermometer. There’s a bulb of mercury on the bottom of a tube. As the mercury is heated, it expands, filling up the tube. We measure the temperature by looking at how far the mercury has gone up the tube as a result of being heated. The amount that the mercury expands is analogous to the heat, and we measure that expansion, so the thermometer is an analog device. Voice signals through telephone lines were commonly analog up to a few year ago. The louder you spoke, the larger the signal on wire. The modulations in your voice corresponded to electric characteristics of the signal. An analog signal.

A digital signal is a signal that’s in code. The relationship between the signal and its meaning is artificial. If you know the code, you can get the meaning, if you don’t you can’t. So, a piece of music might be encoded as a series of numbers. You need to understand the relationship of the numbers to the sounds in order to reproduce the music.

Why is this important? Digital data, that is, data stored on computers, is only analogous to binary notation. Digital data doesn’t  “look like” letters, numbers, music or images. Yet we use  digital data to store images, music, text, and numbers on our computers. We do this by using codes. A certain collection of digital data corresponds to the letter “A”. Another collection, to the letter “B”.

Since digital data doesn’t look like what its describing, we use codes to interpret the 1’s and 0’s into things we can understand.

Using 1’s and 0’s to represent text.

Here’s some text:

My name is Ishmael

The binary equivalent looks like this:

M        y        (space)  n         
01001101 01111001 00100000 01101110
a        m        e        (space)
01100001 01101101 01100101 00100000
i        s        (space)  I
01101001 01110011 00100000 01001001
s        h        m        a
01110011 01101000 01101101 01100001
e        l
01100101 01101100

The text is coded into ASCII. ASCII is a code for turning letters into 1’s and 0’s, then, into voltages in circuits. In this way, we can store documents on computers. It works like this.

A single value – a 1 or 0, is called a bit.

8 bits are called a byte.

We separate our computer storage into bytes. Each byte can have one of 255 possible values. In ASCII, each of these values, combinations of 0’s and 1’s, corresponds to a letter, or a number, or a specialized character.

So, if we know ASCII we can encode all kinds of text and store it on a computer.  Other, more complicated codes exist for storing music or images, or foreign languages with different character sets. They all reduce to 1’s and 0’s though.

What is a file ?

A delineated collection of data on a storage medium. Delimiters used are dependent on the  file system used. A file system can be independent of the operating system. File system types, on a windows system, are indicated by the suffix of the file. Some common windows file types and suffixes are:

Suffix type
Doc word processing
Exe program
Dll program
Jpg image
Avi video
Flv video
Mp3 music
Mp4 music


File Formats

There are thousands of different file formats, most of which you will never need to know. We will cover the more common file formats in this class.

Doc, docx, txt, csv, xml, mp3, wav, flv, mp4, Jpg, tif, pdf, exe, dll

Operating systems match application programs to files, e.g. xxx.doc file would be opened by Microsoft Word.  yyyy.mp3 would be opened by a media player.

Compression

You need to know this. Read through lossy and lossless compression.

http://www.howstuffworks.com/file-compression.htm

You should know what lossy compression is and lossless compression.

Storing images on a computer is something most of us do all of the time when we take pictures on our cell phones. Here’s how images are encoded and stored.

http://web.stanford.edu/class/cs101/image-1-introduction.html

This too, is part of the class.

http://web.stanford.edu/class/cs101/analog-digital-3.html

Metadata – data about data

e.g. a jpg image might contain data about the type of a camera that created the image

An mp3 music file might contain the name of the musician

Picasa will show you metadata from photos it displays.

Itunes shows you metadata from music files.

Video formats

Compression is often divorced from the standard. The compression algorithm “plugs in” to the file format. So we may speak of an AVI file using a DX32 codec. (codec stands for compression-decompression).

Licensing a big deal – modern codecs are commercial property and are licensed by people who make video.

Uses – eliminate film distribution to movie theaters and move to digital distribution over the internet.

Making 4000 prints of a movie (4 reels) and getting all over the country each print cost $1500 or more.

Digital distribution much easier and cheaper

Movies still made with film but changing

Video has greatly reduced cost of production and distribution.

Codecs and licensing separate for players.

Codecs are programs which compress and decompress files. Codecs are often marketed like Adobe pdf is. You give away the player, and charge for the program which produces the file.

Sound

CD Standard – CDA (Compact Disc Audio)

First used to record CD’s – development of standard was driven by the recording industry. The recording industry demanded that CD’s must sound better than records, justifying a price increase for CD’s over records despite that manufacturing costs for CD’s are lower than for records.

MP3 variable Fidelity, Computer based standard

Standard driven by hobbyists and computer experts

Mp3 – variable fidelity standard

Metadata included in the standard

Used compression more aggressively than CD’s and usually lower fidelity but much smaller files.

Steve Jobs

Developed the ipod.
Added licensing to the standard with itunes

 

Imaging Tools

Irfanview

  • All purpose image editing and managing – copy, delete, resize
  • Free, useful program google the name to find.

Picasa

  • More sophisticated – includes free internet hosting for photos
  • Duplicates some functions of facebook, but more private.

 

With what you’ve learned here, you should be able to read and understand this article about redigitizing the movie “Lawrence of Arabia”.

http://www.nytimes.com/2012/09/30/movies/lawrence-of-arabia-mended-returns-to-screen-and-blu-ray.html?pagewanted=all&_r=0

The article talks about the difficulties and tradeoffs made in digitizing a 65mm print.