Unix and Linux: Shellscripts and Regular Expressions
INFO1-CE9545 / Y12.1005
Syllabus

Introduction

This course is named “Unix and Linux: Shellscripts and Regular Expressions”. It is offered for no credit as INFO1-CE9545 and for credit as Y12.1005.

The catalog description is here. The course home page is
http://i5.nyu.edu/~mm64/INFO1-CE9545/
A harder way to get to there is via NYU Blackboard, to which you can log in with your NYU NetID at
http://www.home.nyu.edu/

The course is offered by
New York University
School of Continuing and Professional Studies
Department of Information Technologies

Dates

See the instructor’s home page for the current semester’s dates and hours. Starting in Fall 2008, this course is 35 hours of classroom time. Starting in Fall, 2012, the regular expression course INFO1-CE9960 no longer meets with this course.

This document

This document contains only an outline of the topics covered. The entire content of the course is online at
http://i5.nyu.edu/~mm64/handout/
The current version of this document (September 16, 2013) is online at
http://i5.nyu.edu/~mm64/INFO1-CE9545/syllabus.html
If you are reading a copy from any other source, it may already be out of date.

Instructor

Mark Meretzky has no office, and hence no office hours, but answers questions via email. He has taught C, C++, Java, Javascript, Ruby on Rails, Unix, Networking, iPhone, Android, and Calculus at NYU.
mark.meretzky@nyu.edu
http://i5.nyu.edu/~mm64/

Catalog description

I write the following description for the catalog.

Acquire the foundational skills to go anywhere in the Unix world: networking, programming, or system administration. You will learn to create files and directories; copy, move, rename, search, archive, compress, and remove them; read- and write- protect them; and connect them with hard and symbolic links. Run programs and join them to files and to other programs with i/o redirection. Compose Unix command files (shellscripts) using command line arguments, pipelines, loops, conditional statements, file descriptors, and exit status. Personalize your Unix account with environment variables, aliases, startup scripts, and your own versions of existing Unix commands. Control a running process with the Korn shell, and learn to schedule, start, stop, and kill it. Use regular expressions to search, edit, and transform data with the utilities grep, sed, awk, and the vi text editor; protect source code with RCS. Host HTML forms and CGI scripts on a Sun Solaris box. Prerequisites: none.

Prerequisites

The student must understand English and come to class. The course requires no knowledge of a programming language. The language taught in this course (the Unix shell language) will therefore be the first language for some students. I envy them.

Be methodical. You can learn Unix if you try every example and see if it works. It’s also educational to do something wrong deliberately and observe what happens.

How this course fits into the certificate

INFO1-CE9545 is a prerequisite for, and covers all of the material in, Unix III: Advanced Shellscripting (INFO1-CE9560).

INFO1-CE9545 is self-contained and is independent of the rest of the program. To learn about the course, email the instructor at mark.meretzky@nyu.edu.

What to purchase for this course

The textbook is free, and is online at
http://i5.nyu.edu/~mm64/handout/
You don’t have to buy it, but you’ll have to print it out or bring it on a laptop as we cover it, chapter by chapter, in class.

Get one additional Unix book. If you don’t already have one, the book I recommend is The Unix Programming Environment by Brian W. Kernighan and Rob Pike; Prentice-Hall, 1984; ISBN: 0-13-937681-X. As usual, buy it online at Amazon.

The lectures are given in a room with no computers except the instructor’s laptop, but every student gets a free Unix account on the Unix machine i5.nyu.edu. It happens to be a Solaris box, but at this elementary level there are few differences between the versions of Unix. You can use your account by going to one of the computer labs on campus, but you probably don’t want to. You would rather log into your NYU Unix account from your PC or Mac at home or at work. The communications software for this (PuTTY or ssh respectively) is free. Of course, you would need a PC or Mac to do this.

Students are welcome to try the examples on their own Unix, if they have one. There is no need to get your own Unix for this course, however.

Examinations and major assignments

There are no exams. I can serve you better by using all the hours of class time to tell you about Unix and to entertain questions.

There will be many individual homeworks, most of them Unix shellscripts. For each shellscript, hand in the shellscript itself and the output thereof on paper, one week after it was assigned. Do not email me your homework—I don’t want to have to print it for you. I will return each homework to you one week after you hand it in, with every error corrected. (Eager beavers can have their homework corrected and returned to them on the night they hand it in, during the intermission.) I may hand out some of the answers on paper in class. If so, you get no credit for an assignment that you hand in after I hand out the answer. (Assignments handed in after I give out the answers are invariably copies of my answers.) You get only one chance to hand in each problem. Write your real name on each page you hand in, or staple them together and write your name on the first page.

I post each week’s homework on the course home page the morning after each lecture. Whether you are present or absent, you should check the home page each week to find out what the homework was, and to see if any errors were discovered in the course materials. To show you in advance how much work to expect, the previous semester’s assignments are kept online on the home page.

Grading

Your grade will be based on the homeworks you hand in. Hand them in on paper unless the instructor says otherwise. If you do the homeworks correctly, you get a good grade. If you do the homeworks incorrectly, you get a bad grade. For example, if less than 50% of your homeworks produce the correct output, you will fail the course. (I mention this only for legal reasons. In real life, no one ever comes close to having only 50% of their homeworks produce the correct output. Everyone does almost all the coursework or almost none of it.) Note that a program must run in order to produce the correct output, or indeed any output at all.

To collaborate with one or two other people, you may collectively hand in one copy of every assignment with the names of the two or three authors. You must stay with the same partner(s) throughout the semester, and you will all receive the same grade. In the real world you will program with other people, so I encourage you to do so now. Two people usually do a better job than one; it’s also less work for me to read.

You must do all your own work with no help from anyone except your partner(s), if any. If I receive multiple copies of the same work from people who are not partners, the person whose name comes first alphabetically gets full credit, and the other people get no credit. After you’re caught, it is too late to make the other person your partner. You will also fail the course if you hand in copies of my answers, or anybody else’s answers. Plagiarists are subject to the ridicule of their teacher and peers.

I reiterate my bold claim that I will catch and correct every error in every homework for you, one week after you hand it in. But I will not give a grade to each individual homework. The only grade you will receive will be for the entire course. I grade on a curve. This means that your grade depends partly on how well everyone else does during the course, so it is impossible to predict before the end of the course.

I will not tell you your grade. I mail the grades to NYU immediately after the last lecture of the course, or when I receive the grading sheet from NYU, whichever comes last. I don’t know how long it will take NYU to make your grade available to you. They will provide a printed transcript.

SCPS dean Carl F. Lebowitz says “Incomplete grades should be given only in rare circumstances where a student has been able to complete nearly all of the course assignments by the end of the semester and has submitted the Incomplete Contract in advance regarding the situation.” NYU also says that to receive a passing grade, students must attend 80% of the course.

To request an Incomplete, fill out the incomplete contract and ITS Form 775 and have me sign them before the last lecture. To complete your incomplete, you have to do the assignment on the home page.

Course Objectives

Three operating systems dominate the world: Windows, Mac, and Unix. The operating system is the thing in your computer that invites you to run programs. Most operating systems do this by showing you a desktop with icons and menus. But the Unix operating system has no desktop. To run a program in Unix, you have to type a command line in response to a prompt.

The Unix operating system (also known as the kernel) does more than invite you to type in the names of the programs you want to run. It parcels out memory to competing programs, manages the files and directories on the disks, and communicates with printers and other computers on a network.

But this course does not concern itself with these hidden tasks. It covers only the user interface of Unix: what you need to know to type command lines into a Unix shell window, and what the result will be. It does not cover how the implementation of Unix (kernel internals), system administration, or networking. The material in this course is a prerequisite for these topics, however.

So what do you need to know to issue Unix commands? Unix comes with two hundred or so small programs (the utilities or tools). Individually, many of them are so rudimentary as to be laughable (e.g., uniq). The power of the system emerges from connecting them together to build larger structures such as pipelines and shellscripts (command files), in which the output of one program is fed as input into the next.

The language in which the programs are connected together is called the shell language. It comes in several flavors; we use the Korn shell.

Many Unix programs process their input selectively: you tell them what parts of their input they should respond to, or conversely, what parts they should ignore. The notation in which you describe this is called regular expressions. We use it in searching and editing.

Of course, there’s much more to Unix than this introductory course. But the basic topics we teach here (files and directories, shellscripts, and regular expressions) will be needed everywhere you go in the Unix world.

Overview of Sessions

The Handout for each session is already online, containing every example we will work through and its references. The topics are spread out over 10 weekly lectures of 3.5 hours each.

  1. Preview of the four major topics:
    1. the Unix utilities or tools;
    2. connecting the Unix utilities by means of the shell language;
    3. using an editor (such as vi) to write a shellscript;
    4. searching text by means of regular expressions.

  2. Install Linux on a PC in VirtualBox. Log into Unix on a Solaris server (i5.nyu.edu) via ssh or PuTTY.

  3. Files and directories. Command lines vs. a desktop GUI. Run and dismiss the clunky CDE desktop for Solaris. Navigate around the tree of directories. Three special directories: home directory, current directory, root directory. Create, copy, move, rename, and remove files and directories. Print a file, list a directory. Read, write, and execute permission for files and directories. Hard links and symbolic links.

  4. System files (shared by all users): /etc/passwd, /etc/group, /etc/motd, /usr/dict/words, /var/apache/logs/access_log. The online manual and its nine sections.

  5. Loggin in. The kernel is the operating system itself; the shell is the command interpreter that surrounds it. The /etc/passwd file determines your choice of startup shell. The .profile file contains commands that execute when you log in to personalize your environment It creates aliases and environment variables, formats your prompt, enables keyboard shortcuts, and determines the default nine permisison bits of the files you create.

  6. Input and output redirection (i/o redirection). The three file descriptors for standard input, standard output, and standard error output. Input from various sources, output to various destinations: the terminal keyboard and screen, files on the hard disk, pipes, /dev/null, and hardware devices. The connections are made with the following operators: | (the pipe symbol), >, >>, <, <<, and `back quotes`. Carbon copying with tee connections; multiple copying with wall; transcribe a terminal session with script.

  7. Unix utilities and filters. Sort, search, collate, merge, format, print, count, compare, identify duplicates, filter with awk, etc. The simplest utilities are the most useful, especially when connected in sequence to form pipelines.

  8. Shellscripts and shell programming. Unix comes with several shells: Bourne shell, Korn shell, C shell, Bourne Again shell, etc. We use the Korn shell. Shellscripts and their command line arguments. Shell language control structure: for and while loops, break and continue, if statements. Local and environment variables, and the interaction between variables, 'single quotes', and "double quotes". Exit status: how to produce it, detect it, and use it for communication between programs.

  9. Process control. The ps program shows the tree of processes. Parent and child, process ID numbers. Run two or more processes at the same time. Kill, stop, and restart a process by sending it a signal. Foreground and background processes: fg, bg, jobs, &. Schedule a program to run at a future time with at: one time, a finite number of times, or an infinite number of times.

  10. Searching with regular expressions. Total coverage of regular expressions (including tagged regular expressions) with grep and egrep. Search text files such as /etc/passwd and /var/apache/logs/access_log, or the output of a program.
    1. Anchors: ^$
    2. Wildcards: . [^-]
    3. Repetition: *+? \{,\}
    4. Alternation: (|)
    5. Tagging: \(\)\1

  11. Editing with regular expressions. Interactive editing with the vi text editor and its variants vim and view; non-interactive (batch) editing with the sed stream editor. Substitute commands using regular expressions, including tagged regular expessions.

  12. Miscellaneous. Archive files and directories with tar and the Concurrent Versions System CVS. Compress files with compress and uncompress. Search for a file in the tree of directories with find. Use awk for arithmetic, searching for a pattern that straddles more than one line, remembering previous input. World Wide Web home pages with HTML, including Unicode characters, hyperlinks, touch-sensitive image maps, CGI gateways, and forms. Download files from Windows with Samba.

  13. Summary. Design the format of data files. Is Unix limited to processing data only in the form of text? Automate the paper flow for qualtity assurance and system administration.