Convert Media

Sometimes one simply need to convert a video, audio file or document to another format.

Text encoding

Text encoding can get totally wrong, specially when the language requires special characters like אהח. The command iconv can convert from one encoding to an other.
# iconv -f <from_encoding> -t <to_encoding> <input_file>
# iconv -f ISO8859-1 -t UTF-8 -o file.input > file_utf8
# iconv -l                           # List known coded character sets
Without the -f option, iconv will use the local char-set, which is usually fine if the document displays well.
Convert filenames from one encoding to another (not file content). Works also if only some files are already utf8
# convmv -r -f utf8 --nfd -t utf8 --nfc /dir/* --notest

Unix - DOS newlines

Convert DOS (CR/LF) to Unix (LF) newlines and back within a Unix shell. See also dos2unix and unix2dos if you have them.
# sed 's/.$//' dosfile.txt > unixfile.txt                  # DOS to UNIX
# awk '{sub(/\r$/,"");print}' dosfile.txt > unixfile.txt   # DOS to UNIX
# awk '{sub(/$/,"\r");print}' unixfile.txt > dosfile.txt   # UNIX to DOS
Convert Unix to DOS newlines within a Windows environment. Use sed or awk from mingw or cygwin.
# sed -n p unixfile.txt > dosfile.txt
# awk 1 unixfile.txt > dosfile.txt   # UNIX to DOS (with a cygwin shell)
Remove ^M mac newline and replace with unix new line. To get a ^M use CTL-V then CTL-M
# tr '^M' '\n' < macfile.txt

PDF to Jpeg and concatenate PDF files

Convert a PDF document with gs (GhostScript) to jpeg (or png) images for each page. Also much shorter with convert and mogrify (from ImageMagick or GraphicsMagick).
# gs -dBATCH -dNOPAUSE -sDEVICE=jpeg -r150 -dTextAlphaBits=4 -dGraphicsAlphaBits=4 \
 -dMaxStripSize=8192 -sOutputFile=unixtoolbox_%d.jpg unixtoolbox.pdf
# convert unixtoolbox.pdf unixtoolbox-%03d.png
# convert *.jpeg images.pdf          # Create a simple PDF with all pictures
# convert image000* -resample 120x120 -compress JPEG -quality 80 images.pdf
# mogrify -format png *.ppm          # convert all ppm images to png format
Ghostscript can also concatenate multiple pdf files into a single one. This only works well if the PDF files are "well behaved".
# gs -q -sPAPERSIZE=a4 -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=all.pdf \
file1.pdf file2.pdf ...              # On Windows use '#' instead of '='
Extract images from pdf document using pdfimages from poppler or xpdf
# pdfimages document.pdf dst/        # extract all images and put in dst
# yum install poppler-utils          # install poppler-utils if needed. or:
# apt-get install poppler-utils

Convert video

Compress the Canon digicam video with an mpeg4 codec and repair the crappy sound.
# mencoder -o videoout.avi -oac mp3lame -ovc lavc -srate 11025 \
-channels 1 -af-adv force=1 -lameopts preset=medium -lavcopts \
vcodec=msmpeg4v2:vbitrate=600 -mc 0 vidoein.AVI
See sox for sound processing.

Copy an audio cd

The program cdparanoia can save the audio tracks (FreeBSD port in audio/cdparanoia/), oggenc can encode in Ogg Vorbis format, lame converts to mp3.
# cdparanoia -B                      # Copy the tracks to wav files in current dir
# lame -b 256 in.wav out.mp3         # Encode in mp3 256 kb/s
# for i in *.wav; do lame -b 256 $i `basename $i .wav`.mp3; done
# oggenc in.wav -b 256 out.ogg       # Encode in Ogg Vorbis 256 kb/s