1. Overview
When we work with the command-line under Linux, we often need to process text files. In this tutorial, we’ll address different ways to remove the last n lines from an input file.
Also, we’ll discuss the performance of those approaches.
2. Introduction to the Example
First of all, let’s create an input file to understand the problem:
$ cat input.txt01 is my line number. Keep me please!02 is my line number. Keep me please!03 is my line number. Keep me please!04 is my line number. Keep me please!05 is my line number. Keep me please!06 is my line number. Keep me please!07 is my line number. Keep me please!08 is my line number. Delete me please!09 is my line number. Delete me please!10 is my line number. Delete me please!
As the output above shows, our input.txt contains ten lines.
Now, suppose we want to remove the last three (n=3) lines from the input.txt file.
In this tutorial, we’ll address solutions to the problem using four techniques:
- Using thehead command
- Using thewc and the sedcommands
- Using thetac and the sed commands
- Using theawk command
After that, we’ll discuss the performance of the solutions and find out which is the most efficient approach to the problem.
3. Using the head Command
Using the head command, we can print all lines but the last x lines of the file by passing a number following the hyphen (-) together with the -n option, for instance, -n -x.
Therefore, we can use this option to solve our problem in a straightforward way:
$ head -n -3 input.txt 01 is my line number. Keep me please!02 is my line number. Keep me please!03 is my line number. Keep me please!04 is my line number. Keep me please!05 is my line number. Keep me please!06 is my line number. Keep me please!07 is my line number. Keep me please!
But thehead command prints the result in stdin. We can save the result back to input.txt via a temp file:
$ head -n -3 input.txt > tmp.txt && mv tmp.txt input.txt
4. Using the wc and sed Commands
Using thesed command and its address range, we can quickly delete lines from a file starting at a given line number until the last line:
sed 'GIVEN_LINE_NO, $ d' input_file
For example, let’s delete from line 5 until the end of our input.txt:
$ sed '5,$ d' input.txt 01 is my line number. Keep me please!02 is my line number. Keep me please!03 is my line number. Keep me please!04 is my line number. Keep me please!
However, our problem is to delete the last three lines from the input file. Since our input file has ten lines, the sed command: sed ‘8,$ d’ input.txt will be the solution to the problem.
Thus, the problem turns into how to calculate the line number “8“, which is the first line number to be deleted.
Now, it’s time to introduce the wc command. Using thewc command with the -l option, we can easily get the total number of lines (TOTAL) in a file:
$ wc -l input.txt 10 input.txt
Further, we can get the first line number to delete by calculating TOTAL – n + 1. In our example, we have n=3:
$ echo $(( $(wc -l <input.txt)-3+1 ))8
Let’s take a closer look at the command above:
- wc -l <input.txt: Here we redirect the input.txt file to stdin to skip the filename from the output
- $(wc -l <input.txt): We used a command substitution to capture the TOTAL result
- $(( TOTAL – 3+1 )): The arithmetic expansion will evaluate the math expression
Now, let’s assemble the two parts together and try to solve our problem:
$ sed '$(( $(wc -l <input.txt)-3+1 )),$ d' input.txtsed: -e expression #1, char 2: unknown command: `('
Oops! Why does the sed command complain about the “(“?
This is because bash expansions and command substitutions will not get expanded between single quotes.
Let’s change the single quotes in our sed command into double quotes and test again:
$ sed "$(( $(wc -l <input.txt)-3+1 )),$ d" input.txt 01 is my line number. Keep me please!02 is my line number. Keep me please!03 is my line number. Keep me please!04 is my line number. Keep me please!05 is my line number. Keep me please!06 is my line number. Keep me please!07 is my line number. Keep me please!
Great! Now the problem is solved.
If we are using the popular GNUsed, we can use the -i option to write the change back to the input file:
$ sed -i "$(( $(wc -l <input.txt)-3+1 )),$ d" input.txt
5. Using the tac and the sed Commands
In this section, we’ll still solve the problem using the sed command, but from a different perspective.
We have learned that the difficulty of solving the problem using sedis to calculate the first line number to delete.
However, if we can reverse the order of lines in the input file, the problem will turn into “remove first n lines from a file.” A straightforward sed one-liner sed ‘1,n d’ can remove the top n lines. After that, if we reverse the lines again, our problem gets solved.
Thetac command can reverse the order of lines in a file. That is, we can try to solve our problem through a command “tac INPUT_FILE | sed ‘1,n d’ | tac”.
Finally, let’s test if it will work for our example:
$ tac input.txt | sed '1,3 d' | tac01 is my line number. Keep me please!02 is my line number. Keep me please!03 is my line number. Keep me please!04 is my line number. Keep me please!05 is my line number. Keep me please!06 is my line number. Keep me please!07 is my line number. Keep me please!
Yes! It works. We get the expected result.
6. Using the awk Command
The awk command is a powerful text-processing utility. We can letawk go through the input file twice to solve the problem.
In the first pass, it’ll find out the total number of lines in the file, and in the second pass, we print those lines we want to keep:
$ awk -v n=3 'NR==FNR{total=NR;next} FNR==total-n+1{exit} 1' input.txt input.txt 01 is my line number. Keep me please!02 is my line number. Keep me please!03 is my line number. Keep me please!04 is my line number. Keep me please!05 is my line number. Keep me please!06 is my line number. Keep me please!07 is my line number. Keep me please!
As the output above shows, theawk command solved our problem.
Finally, let’s understand how the one-liner works:
- -v n=3: We declared an awk variablen=3
- NR==FNR{total=NR;next}: This is the first pass. In this pass, theawk command saves the current line number to a variable called total. After the first pass, the total variable holds the total number of lines in the input file
- FNR==total-n+1{exit} 1: This is the second pass. If the FNR==total-n+1, it means we have reached the first line that needs to be removed, so we exit. Otherwise, we just print the line. Here, non-zero number 1 will be evaluated as trueand trigger the default action ofawk:print
7. Performance
So far, we’ve learned different ways to solve the problem. Now, let’s discuss their performance.
We’ll create a big input file with 100 million lines and test each solution on it to remove the last 1 million lines:
$ wc -l big.txt 100000000 big.txt
To benchmark their performance, we’ll use the time command:
- Thehead solution: time head -n -1000000 big.txt > /dev/null
- Thewc andsed solution: time sed “$(( $(wc -l <big.txt)-1000000+1 )),$ d” big.txt > /dev/null
- Thetac andsed solution: time tac big.txt | sed ‘1,1000000 d’ | tac > /dev/null
- Theawk solution: time awk -v n=1000000 ‘NR==FNR{total=NR;next} FNR==total-n+1{exit} 1’ big.txt big.txt > /dev/null
Now, let’s have a look at the test result:
Solutions | time output |
---|---|
The head solution |
|
The wc and sed solution |
|
The tac and sed solution |
|
The awk solution |
|
As the table shows, thehead solution is the fastest. It’s about 30 times faster than the sed solutions and 150 times faster than the awk command.
This is becausethehead command only reads the newline characters without doing any pre-processing or holding the text of aline. It seeks until it finds the target line number and dumps the contents into the output.
On the other hand, thesed and theawk command will read every line of the input file and do some pre-processing. For example,theawk command initializes some internal attributes depending on the givenFSandRS, such as fields,NF, records, and so on. Therefore, it adds a lot of overhead that isn’t needed for our problem.
Even though the sed and theawk solutions are much slower than thehead solution to solve this problem, it’s still worthwhile to learn them and understand how they work. That’s because they are much more extendable than the head command.
For instance, let’s say we face a new problem, changing all “foo“s into “bar“s in the last n lines of an input file. Now, thehead command cannot solve the problem. However, we can extend thesed or theawk command to solve it.
8. Conclusion
In this article, we have addressed different ways to remove the last n lines from an input file. After that, we discussed the performance of the solutions.
If we need to solve this problem on a large input file, thehead solution will give us the best performance.
FAQs
Remove the Last N Lines of a File in Linux | Baeldung on Linux? ›
3. Using the head Command. Using the head command, we can print all lines but the last x lines of the file by passing a number following the hyphen (-) together with the -n option, for instance, -n -x.
How do you remove the last n lines of a file in Linux? ›Another option to remove the last N lines of a file is to use the tac command. The tac command is used to concatenate and print files backwards, which can be useful for removing the last N lines of a file. The tac command is used to concatenate and print the file file. txt in reverse.
How do I remove certain lines from a file in Linux? ›The sed command is a Linux utility that is used to perform text transformations on an input file. It can also be used to remove the first line of a text file. The sed command can be used to delete a specified line in a text file by using the d command.
How to remove first n number of lines from the file in Linux? ›- Step 1: Check the contents of the file. ...
- Step 2: Delete the first N lines using sed. ...
- Step 3: Verify that the lines were deleted.
Highlight the line you want to delete, then hit dd or D on the keyboard. The editor will automatically remove the whole line from the file. Vi and Vim commands are case-sensitive, which means d and D are identified as two separate commands. Hitting dd or D multiple times will delete several lines one by one.
How do I delete the last 10 lines in Linux? ›Using the wc and sed Commands
Since our input file has ten lines, the sed command: sed '8,$ d' input. txt will be the solution to the problem.
To display the last part of the file, we use the tail command in the Linux system. The tail command is used to display the end of a text file or piped data in the Linux operating system. By default, it displays the last 10 lines of its input to the standard output. It is also complementary of the head command.
How do you grep and delete lines from a file? ›Using the grep Command
You can use the grep command in Linux to remove the lines from file A that appear in file B. This command uses the -v option to invert the match, so that it returns lines that do not match those in file B. The -f option specifies the file containing the patterns to match.
All you have to do is type in the command “sort -u” followed by the name of the file. This will take the file and sort the content, then use the command “uniq” to remove all duplicates. It's an easy and efficient way to remove duplicate lines from your files.
How do I delete a specific line in a file? ›- Open the file in read mode.
- Read the files contents.
- Open the file in write mode.
- Use a for loop to read each line and write it to the file.
- When we reach the line we want to delete, skip it.
How to truncate a file in Linux? ›
- Truncate Files via Shell Redirection Operator. Use Colon (:) as Null Command. Use the cat Command. Redirect Using echo. Use Redirection Only. Truncate with sudo.
- Truncate Files via truncate Command. Remove File Contents Entirely. Truncate File to Specific Size. Increase File Size.
Use string slicing to remove the first N characters from a string, e.g. new_string = string[N:] . The slice will remove the first N characters from the string by returning a copy of the string that starts at the character after the first N characters.
How do I remove the first n character in Linux? ›- Overview. In this tutorial, we'll learn how to remove the first n characters of a line using the tools provided by GNU/Linux.
- Using cut. cut allows us to select certain sections of a line either by length or by a delimiter. ...
- Using sed. ...
- Using grep. ...
- Using awk. ...
- Using perl. ...
- Using Parameter Expansion. ...
- Conclusion.
So, while at any line you press d G , it deletes all the line starting from the current line till the last one. If you know which consecutive lines to delete -- say, from line 101 to 200 -- type in the following key sequences: Esc : 1 0 1 , 2 0 0 d .
How do I remove the last 3 characters from a string in Linux? ›To remove the last n characters from a string in bash, the cut command, sed command, and parameter expansion are the three ways. These methods are easy to use and can be helpful in various Bash scripting tasks. By using these methods, we can easily manipulate strings and perform text transformations in Bash.
How do you remove characters from the end of a file in Linux? ›Using the truncate Command. The command truncate contracts or expands a file to a given size. The truncate command with option -s -1 reduces the size of the file by one by removing the last character s from the end of the file. The command truncate takes very little time, even for processing large files.
How to delete till end of line Linux command? ›The command d$ (note, that's a dollar sign, not an 'S') will delete from the current cursor position to the end of the current line. D (uppercase D) is a synonym for d$ (lowercase D + dollar sign). "d0" (lowercase d and number 0) to delete to the beginning of the line.
What is the command for the last lines of a file in Linux? ›Linux Tail Command Syntax
Tail is a command which prints the last few number of lines (10 lines by default) of a certain file, then terminates. Example 1: By default “tail” prints the last 10 lines of a file, then exits.