Bash process substitution

skywhopper · on Feb 9, 2015

I use this a lot in a couple of different contexts.

First, it's a convenient way to turn around pipelines inside scripts in situations where you need to feed a command output or pipeline into a read loop but the loop needs to be part of the original script process so it can manipulate variables, etc. Meaningless example of optionally incrementing a counter in a loop based on the output of a pipeline of commands:

    while read LINE; do
        if grep -q xyz <<< "$LINE"; then
            COUNTER=$(( $COUNTER + 1 ))
        fi
    done < <(awk '{print $4,$7}' infile.txt |cut -d, -f3)

If you did it in the "awk ...|cut ...| while read LINE..." order, COUNTER would be getting incremented in a subshell, and the value wouldn't stick around after the loop.

The other most common way I use it is to diff a couple of sets of pipelines or even just files that need sorting before comparison:

    diff <(sort -u file1.txt) <(sort -u file2.txt)

Saves a ton of steps. As pointed out by jkbyc, the >() construction for output process substitution becomes really powerful when combined with tee.

skywhopper · on Feb 9, 2015

Another example of using diff on two process substitutions I did this morning. I've been quarantining some files from various paths that all have unique basenames into a single directory. I have a CSV file listing the original path in the first field followed by some other metadata. I wanted to compare the files in the CSV file ("quarantined-files.csv") to the files I had moved into the "quarantine" subdirectory. I came up with this:

    diff <(cut -d, -f1 quarantined-files.csv |xargs -L1 basename |sort) <(ls ./quarantine/ |sort)

(Yes, the listing from "ls" comes out sorted, but it's always possible it'll sort certain characters a bit differently than "sort" does depending on locale settings, aliases, etc, etc. Piping both outputs through "sort" should ensure the same sorting collation in all cases.)

It took me a while to really grok this sort of thing, but once it clicked, it's often the first thing I think of. Basically any time you might be tempted to dump something to a temporary file, you should ask yourself "can I use process substitution?"

jkbyc · on Feb 8, 2015

I sometimes use it to do parallel greps on large files:

  zcat large_file.gz | tee >(grep -F pattern1 > pattern1_results.txt) >(grep -F pattern2 > pattern2_results.txt) > /dev/null

If you stick _pv_ between zcat and tee you can even get nice info about the progress.

geoka9 · on Feb 8, 2015

That's a powerful idea, thanks!

jonaf · on Feb 9, 2015

One neat trick I've found is using `exec` in conjunction with process substitution to effect and gather changes to the current environment within the context of a script (usually not as useful with a pty attached).

Example: how can you redirect stdout to syslog, a specified log file, and stdout? Easy, with tee, right?

``` $my_command | tee -a >(logger -t "$0[my_command]" -p "local0.INFO") "/var/log/my_logs/$(date +%s).log") ```

Okay, that's cool. But, I have to run it for every command in my script! How inconvenient. Is there a way we can make all stdout for the script do this?

Yes! Use `exec`!

``` exec &> tee -a >(logger -t "$0[my_command]" -p "local0.INFO") "/var/log/my_logs/$(date +%s).log")

$my_command ```

jonaf · on Feb 9, 2015

Forgive my botched formatting. Long time lurker here :)

suprjami · on Feb 8, 2015

I find it useful to read the Bash manual every couple of years.

It's not too long and very well written so it's an enjoyable read.

You almost always learn something new which you can integrate into your workflow.

http://www.gnu.org/software/bash/manual/bashref.html

barrkel · on Feb 8, 2015

I use this all the time. It's at its most useful when you want to use a process as a source or sink, and avoid using a temporary file as input or output when a tool that doesn't operate on standard input or output. Often, these tools need to work with multiple files that have different semantic intent.

For example, `grep -f <pattern-file>` - grep works on standard input, but if you also want to provide patterns via process output, you need to use temporary files, fifos, or process substitution.

Here's an example:

    find | fgrep -f <(cut -f 2 tab-delimited-text.txt)

jkbyc · on Feb 8, 2015

  $ echo "Process substitution is fun" | tee >(rev)
  Process substitution is fun
  nuf si noitutitsbus ssecorP

ptx · on Feb 8, 2015

I've never understood the point of tee without process substitution. I guess you could do the same thing manually by opening another file descriptor first an forking some process, but is it of any use in a normal simple pipe?

denisw · on Feb 8, 2015

A good use case is to redirect output into a file you only can modify with sudo. Trying to do it like this doesn't work:

  sudo echo "fs.inotify.max_user_watches=1000000" > /etc/sysctl.conf

as the redirection happens within the shell, thus without root permissions. However, the following does work:

  echo "fs.inotify.max_user_watches=1000000" | sudo tee /etc/sysctl.conf

as the writing is done by the elevated tee process.

derefr · on Feb 8, 2015

That seems like an anti-unixism (operating with multiple EUIDs in a single process group), though I'm not too sure what the equivalent unixism would be—maybe something with a daemon and a spool directory? Or a user-writable FIFO, redirected to the correct file by the elevated user in a separate command?

mturmon · on Feb 8, 2015

I use tee to save log files from long running programs. The chatter goes to stdout, and informs me about progress. But the tee also saves the chatter, so I can review it if something goes wrong.

You could get a similar effect in other ways (e.g., tail -f, or terminal scrollback), but I find tee is convenient.

philsnow · on Feb 8, 2015

Often I'll do something like

    long_running_thing | tee long_running_thing_stdout

and want to see the first few lines of output before going to do something else. The pipe causes buffering on stdout, though, so I have to sit there until 4k of output has come out on stdout.

As seen on https://unix.stackexchange.com/questions/25372/turn-off-buff... , either `unbuffer long_running_thing` or `stdbuf -oL long_running_thing` causes it to output linewise instead of 4k at a time.

stringy · on Feb 8, 2015

The fish[1] shell also has process substitution (without requiring additional syntax) in the form of psub (which is implemented simply as a fish function; see `type psub` from a fish shell):

  bash$ wc <(grep script /usr/share/dict/linux.words)
  fish$ wc (grep script /usr/share/dict/linux.words | psub)

The sugar in the bash form might be nice, but it isn't required.

[1]: http://fishshell.com/

0942v8653 · on Feb 9, 2015

One thing it seems to be missing is the bash

    >(wc)

which is obviously only useful with a few things like tee, but still useful.

dag from GitHub came up with[0] a quick way of using psub:

    bind \es 'commandline -ij "(|psub)"; commandline -f backward-word backward-word'

[0]: https://github.com/fish-shell/fish-shell/issues/719#issuecom...

Ded7xSEoPKYNsDd · on Feb 8, 2015

Thanks for that, process substitution is one of the things I occasionally drop down to bash for.

falcolas · on Feb 8, 2015

If you would like to become comfortable with using bash process substitution, I recommend figuring out how many of the 'moreutils' utilities can be replicated using only bash process substitution and your generic linux utilities. It makes for a fun exercise.

moreutils http://joeyh.name/code/moreutils/

_ZeD_ · on Feb 8, 2015

Diff and diff3 support reading from /deve/fdXX ... unfortunalety I don't know a graphic differenza (meld, kdiff3, ...) supporting this kind of input. It's a shame, 'cause I frequently want to diff the output of two program (or the same program with different inputs)

dima55 · on Feb 8, 2015

In zsh, you can use =() instead of <() to slurp all the data into a temporary file first to deal with this issue. Perhaps bash has this also.

falcolas · on Feb 8, 2015

Bash does not have `=()`, but with a few extra commands you could write to temporary files using `mktmp`. Or use zsh. :)

kolev · on Feb 9, 2015

Although I can write a lot of my stuff in (much) higher-level languages like Python or even Perl, I prefer to do it in Bash. The thing that I hate the most is the lack of basic data structures - especially as function returns.