How to fix corrupt HDFS FIles

Question

How does someone fix a HDFS that's corrupt? I looked on the Apache/Hadoop website and it said its fsck command, which doesn't fix it. Hopefully someone who has run into this problem before can tell me how to fix this.

Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Normally NameNode automatically corrects most of the recoverable failures.

When I ran bin/hadoop fsck / -delete, it listed the files that were corrupt or missing blocks. How do I make it not corrupt? This is on a practice machine so I COULD blow everything away but when we go live, I won't be able to "fix" it by blowing everything away so I'm trying to figure it out now.

VeLKerr · Accepted Answer · 2015-11-03 17:27:44Z

108

You can use

  hdfs fsck /

to determine which files are having problems. Look through the output for missing or corrupt blocks (ignore under-replicated blocks for now). This command is really verbose especially on a large HDFS filesystem so I normally get down to the meaningful output with

  hdfs fsck / | egrep -v '^\.+$' | grep -v eplica

which ignores lines with nothing but dots and lines talking about replication.

Once you find a file that is corrupt

  hdfs fsck /path/to/corrupt/file -locations -blocks -files

Use that output to determine where blocks might live. If the file is larger than your block size it might have multiple blocks.

You can use the reported block numbers to go around to the datanodes and the namenode logs searching for the machine or machines on which the blocks lived. Try looking for filesystem errors on those machines. Missing mount points, datanode not running, file system reformatted/reprovisioned. If you can find a problem in that way and bring the block back online that file will be healthy again.

Lather rinse and repeat until all files are healthy or you exhaust all alternatives looking for the blocks.

Once you determine what happened and you cannot recover any more blocks, just use the

  hdfs fs -rm /path/to/file/with/permanently/missing/blocks

command to get your HDFS filesystem back to healthy so you can start tracking new errors as they occur.

edited Nov 3, 2015 at 17:27

VeLKerr

3,0774 gold badges28 silver badges47 bronze badges

answered Oct 7, 2013 at 1:53

mobileAgent

1,6211 gold badge12 silver badges7 bronze badges

7

Thx for your reply. I'll try your suggestion the next time the HDFS has issues. Somehow, it fixed itself when I ran bin/hadoop fsck / -delete. After that, the HDFS was no longer corrupted and some files ended up in /lost+found. It didn't do that before when I stopped the HDFS and restarted several times. I upvoted and accepted your answer =) Thx again.
– Classified
Commented Oct 14, 2013 at 20:19
16

But if a file is replicated 3 times in the cluster, can't I just get it back from another node? I know I had some data loss on one machine, but isn't the whole point of HDFS that this shouldn't matter?
– Marius Soutier
Commented Aug 5, 2014 at 19:44
1

Having had a problem with only one node (it crashed and got some of its files lost), the easiest solution was the one suggested by @Classified, simply execute hadoop fsck / -delete
– sofia
Commented Jul 21, 2016 at 8:43
4

Wouldn't deleting the missing blocks cause data loss? hdfs fs -rm /path/to/file/with/permanently/missing/blocks @mobileAgent
– spark_dream
Commented Apr 18, 2018 at 22:13
2

There are often times when applications write intermediate data that is temporary stuff that can easily be re-generated on failure and therefore are stored with a replication factor of 1. If these types of applications crash for any reason and do not clean up, they will leave behind this data. If at some point in the future the DataNode(s) with the one replica crashes, you will see corrupt blocks. This happens every so often and isn't a big deal. This data can safely be removed to restore the health of the cluster.
– davidemm
Commented Jul 2, 2020 at 20:25

| Show 4 more comments

PradeepKumbhar · Accepted Answer · 2017-05-30 11:55:32Z

33

If you just want to get your HDFS back to normal state and don't worry much about the data, then

This will list the corrupt HDFS blocks:

hdfs fsck -list-corruptfileblocks

This will delete the corrupted HDFS blocks:

hdfs fsck / -delete

Note that, you might have to use sudo -u hdfs if you are not the sudo user (assuming "hdfs" is name of the sudo user)

answered May 30, 2017 at 11:55

PradeepKumbhar

3,4111 gold badge20 silver badges32 bronze badges

Add a comment |

abc123 · Accepted Answer · 2018-07-19 21:32:04Z

2

the solution here worked for me : https://community.hortonworks.com/articles/4427/fix-under-replicated-blocks-in-hdfs-manually.html

su - <$hdfs_user>

bash-4.1$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files 

-bash-4.1$ for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ;  hadoop fs -setrep 3 $hdfsfile; done

answered Jul 19, 2018 at 21:32

abc123

5675 silver badges16 bronze badges

I also had to flip my primary name node before I ran the above commands because it had entered SAFE MODE. Flipping set made the stand by node to become Active and I could run the above commands and got rid of corrupt blocks :)
– abc123
Commented Jul 19, 2018 at 21:42

Add a comment |

sudheer devulapalli · Accepted Answer · 2015-08-13 07:39:54Z

-7

start all daemons and run the command as "hadoop namenode -recover -force" stop the daemons and start again.. wait some time to recover data.

answered Aug 13, 2015 at 7:39

sudheer devulapalli

231 bronze badge

Add a comment |

Collectives™ on Stack Overflow

How to fix corrupt HDFS FIles

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
hadoop
hdfs
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged hadoophdfs or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
hadoop
hdfs
or ask your own question.