2

I looking for a command or bash script to delete all folders except if they have a specific file type (*.pdf) in the first level subfolder.

folder01
  a.txt
  y.txt

folder02
  b.pdf
  z.txt

folder03
  h.txt
  folder03.1
    c.pdf

In the example above folder01 and folder03 needs to be deleted.

My attempt:

#!/bin/bash

shopt -s globstar

# Loop through every subdirectory.
for d in **/; do
    f=("$d"/*)
    if [[ -f "$f" && ! "${f##*/}" =~ ^*.pdf$ ]]; then
        # `echo` to ensure a test run; remove when verified.
        echo rm -r -- "$d"
    fi
done

Joel
  • 123
  • 5
  • 1
    I don’t have so much experience regarding bash script, but I posted as close as I could get. – Joel Sep 07 '22 at 20:06
  • When you say "specific file type (*.pdf)", do you want to test against the filename? (as opposed to `file -b --mime-type …` that may print `application/pdf` regardless of the filename). If so, what if `foo.pdf` happens to be a directory? or a symlink? (In other words: do [files](https://superuser.com/a/1467109/432690) of types other than regular file count?) – Kamil Maciorowski Sep 07 '22 at 20:27
  • @Joel No problem! And thank you for doing this. Seeing what effort you have made helps the community better help you solve this. – Giacomo1968 Sep 07 '22 at 20:37
  • I don't have this situation that has a folder or symbolic link called `xxx.pdf`. I can assure you, it's safe to use *.pdf as the main rule. – Joel Sep 07 '22 at 20:41
  • Is "to delete all folders except if they have a specific file type (\*.pdf) in the first level subfolder" really what you want to do? In your example you don't want to delete `folder02`. When considering `folder02` as the folder, what is the "first level subfolder" that would trigger the exception? OTOH in `folder03` `folder03.1` is a "first level subfolder" and there is a matching file in it; so why "`folder03` needs to be deleted"? Do you mean "to delete all folders except if a folder has a specific file type (\*.pdf) directly in it (i.e. not in a subfolder)"? – Kamil Maciorowski Sep 07 '22 at 21:22
  • Should the current working directory (in general: starting directory) be treated as any other directory in it? and deleted conditionally? I suppose it should never be deleted, but please clarify. – Kamil Maciorowski Sep 07 '22 at 21:25
  • The main directory (where the script is executed) will not be deleted. Just the sub directories inside it. `Folder03` will be deleted because it doesn’t have any pdf files directly inside it at level 1. Yes, it has a sub directory that have `c.pdf`, but I don’t care about any other levels, and `Folder03` needs to be deleted. – Joel Sep 08 '22 at 11:30

3 Answers3

1

The following command prints pathnames of directiories about to be deleted:

# cd to the right directory first

find . -type d ! -name . \( -exec [ -r {} ] \; -o ! -prune \) \
-exec sh -c '
   set -- "$1"/*.pdf
   ! [ -e "$1" ]
' find-sh {} \; -prune -print

If the result looks right, append -exec rm -r {} + after -print. Even if your find supports -delete, do not use it, as it cannot delete non-empty directories.

The code works by running a shell per each directory in consideration. The shell uses globbing to detect files matching *.pdf in the directory. Few remarks:

  • -prune near the end prevents find from descending into directories that will be deleted anyway. E.g. there is no point in checking ./folder03/folder03.1 after we qualify ./folder03 for deletion. And to be clear: deleting ./folder03 with rm -r implies deleting ./folder03/folder03.1, even if there's a file matching *.pdf in folder03.1.

  • ! -name . is POSIX equivalent of (non portable) -mindepth 1 of GNU find, if the starting path is .. It's easy to do this portably if the starting path is ., not so easy otherwise. Therefore I designed the solution so you need to cd to the right directory beforehand.

  • *.pdf does not match hidden files (dot files). Your attempt also uses globbing, so I guess it's fine for you.

  • *.pdf is case-sensitive. A case-insensitive pattern is *.[pP][dD][fF].

  • *.pdf matches files of any type, not necessarily regular files. It's by name only. In one of your comments you wrote "it's safe to use *.pdf as the main rule". So be it.

  • If there is no matching file, *.pdf remains in its literal form in the POSIX shell; so there is at least one "match" and we don't really know if it's a match. In a shell with more features (e.g. in Bash) you can do something about it, but I wanted my code to be portable. This is why I test if the first "match" does not exist in the filesystem (! [ -e "$1" ]) instead of relying on the number of matches.

  • You don't need to be able to cd to every directory being tested.

  • If you are not allowed to read a directory then the shell code won't find any *.pdf in it (even if such file is really there). An attempt to rm -r the directory will fail (unless the directory is already empty), some error messages will be generated. -exec [ -r {} ] \; -o ! -prune prevents find from trying to read the content of such directory and from trying to test it, to delete it. You may want to adjust this part of the solution to your needs, if directories you are not allowed to read are an issue.

  • find-sh is explained here: What is the second sh in sh -c 'some shell code' sh?

Kamil Maciorowski
  • 69,815
  • 22
  • 136
  • 202
  • I tried your script but it’s returning `find: '[': No such file or directory`. I just copied the code into `clean.sh` and save it in the main root folder and executed as root. Under the main folder I have all these `folder01, folder02, folder03` subfolders. – Joel Sep 08 '22 at 11:23
  • @Joel What OS is this? AFAIK `[` is required by POSIX, it should be a separate executable. Try to convert every `[ … ]` to `test …`, i.e. replace `[` with `test` and delete the "closing" `]`. – Kamil Maciorowski Sep 08 '22 at 11:37
1

This seems to work well (EDIT: only if it has a single pdf file):

for d in */; do
  if ! [ -f $d/*.pdf ]; then 
    echo "Will remove $d"
  fi
done

(-f looks for a file at the specified path; -e would more generally look for something at that path)

EDIT: to account for paths with spaces and multiple PDF files in a single directory, you will probably need to use find, for example:

for d in */; do
  if [[ -z $(find "$d" -maxdepth 1 -name "*.pdf" -type f) ]]; then 
    echo "Will remove $d"
  fi
done

I changed it from **/ to */ because for your use case, I believe you do not want globstar and **/ - these will make it loop through subdirectories, for example:

> for d in **/; do echo $d; done
folder01/
folder02/
folder03/
folder03/folder03.1/

In the test case this doesn't seem to change the final result but if you're only interested in a .pdf in first level subdirectories, you don't need it looping over any subdirectories.

If you wanted to delete directories that had no pdf's at any level, you could change the if statement to:

if ! [ -f $d/**/*.pdf ]; then

EDIT: or remove the -maxdepth 1 from the find command.

  • Try it where `$d` expands to `a name with spaces`. Try it where the `*.pdf` part matches two or more files. – Kamil Maciorowski Sep 07 '22 at 23:00
  • It’s a very clear and simple solution, but unfortunately it doesn’t work when I have more than 1 .pdf file inside the folder. – Joel Sep 08 '22 at 11:24
  • Good observations, I was hoping to avoid using a long `find` command but looks like that's the way to go. I'll edit the answer with some fixes. – Stephen Kendall Sep 08 '22 at 16:45
  • 1
    Thank you very much for help. I marked your answer as accepted. It works fine now. – Joel Sep 09 '22 at 14:56
0

Let us say the specific file type is *.pdf >

  1. Put the directories containing *.pdf that you don't want to delete into a file remove

    find -name *.pdf -exec dirname {} ';' > temp && sed 's/\.\///g' temp| sed 's![^/]$!&/!'> remove.txt

  2. Put all the directories in current path to a file current.txt

    ls -d */>current.txt

  3. Compare current.txt and remove.txt and remove files from current.txt that are not in remove.txt

    comm -23 <(sort current.txt) <(sort remove.txt)|sed 's/^/"/g' | sed 's/$/"/g' | xargs rm -r

Note:you can append && rm current.txt remove.txt or if you only want those directories with *.pdf and delete all files and directories in current path than use ls> current.txt in second step instead. This will remove all "files" already existed and created during process

Broly LSSJ
  • 169
  • 6
  • Thanks for reply, but unfortunately I can’t create any additional temp file (current.txt) during the process. I’m looking for a simple approach to solve this. – Joel Sep 07 '22 at 19:45
  • @joel you can append ```&& rm current.txt remove.txt``` or if you only want those directories with *.pdf and delete all files and directories in current path than use ``` ls> current.txt``` in second step. this will remove all files existed and created during process. – Broly LSSJ Sep 08 '22 at 11:22