Hi Folks,
Hope everyone is doing well!
Thanks to everyone in ProxMox for their amazing work and effort, huge fan, and thanks to anyone willing to help, I really appreciate it, means a lot to me!
---
I am trying to use rclone to first sync and then check several source and destination directories.
The destinations are different media types:
- HDD 7200RPM
- HDD 5400RPM
- SSD
Initially I was using rclone to sync around 2TB of data to the HDD 7200RPM.
This worked ok, albeit a little slow - took around 24 hours.
Then I tried to sync the same data, around 2TB to the HDD 5400RPM and this was ridiculously slow.
The sync command did not finish even after 6 days, after which I manually stopped it.
I tried to speed up the rclone sync command and ended up writing the following script, to try and use and tweak parameters for the different types of media.
Anyway, before I was able to try and speed up optimizations I realized I hadn't tried syncing to the SSD.
This is where the problem manifested.
When I tried to run the rclone sync command to the SSD destination the Ubuntu machine froze up after some time and became absolutely non responsive. The only way was to do a HARD shut down and start it again.
I tweaked the transfers from 8 to 4 (
Bash:
--transfers=4
) and this seemed to work for the sync command.
Then the same problem manifested for the rclone check command.
However lowering the checkers even to 1 did not help (
Bash:
).
I played with all parameter/flags I could find which seemed relevant in the rclone documentation, but could not solve the problem.
It looks like when it's running against a faster media such as an SSD it is going over some kind of a limit and it's crashing the Ubuntu machine.
I thought it could be an out of memory issue, however all indicators of the Ubuntu Server machine seem fine.
I decided that fist I will concentrate on the more important issue - not completing and crashing the Ubuntu Server, and then I can think if I can improve the speeds, so lately I have been only running the sync and check functions for the SSD only, nothing else.
Code:
The script I use to run the rclone sync and rclone check commands is in pastebin below:
---
Rclone version:
Bash:
```rclone --versionrclone v1.67.0- os/version: ubuntu 24.04 (64 bit)- os/kernel: 6.8.0-40-generic (x86_64)- os/type: linux- os/arch: amd64- go/version: go1.22.4- go/linking: static- go/tags: none```
---
Rclone config:
Bash:
```### Source directories[localsrc1]type = local[localsrc2]type = local[localsrc3]type = local[localsrc4]type = local################################# Destination directories for single storage units[localdest1]type = localencoding = Asterisk,BackQuote,BackSlash,Colon,CrLf,Ctl,Del,Dollar,Dot,DoubleQuote,Hash,InvalidUtf8,LeftCrLfHtVt,LeftPeriod,LeftSpace,LeftTilde,LtGt,Percent,Pipe,Question,RightCrLfHtVt,RightPeriod,RightSpace,Semicolon,SingleQuote,Slash,SquareBracket[localdest4]type = localencoding = Asterisk,BackQuote,BackSlash,Colon,CrLf,Ctl,Del,Dollar,Dot,DoubleQuote,Hash,InvalidUtf8,LeftCrLfHtVt,LeftPeriod,LeftSpace,LeftTilde,LtGt,Percent,Pipe,Question,RightCrLfHtVt,RightPeriod,RightSpace,Semicolon,SingleQuote,Slash,SquareBracket################################# Destination directories for multiple storage units[localdest2]type = localencoding = Asterisk,BackQuote,BackSlash,Colon,CrLf,Ctl,Del,Dollar,Dot,DoubleQuote,Hash,InvalidUtf8,LeftCrLfHtVt,LeftPeriod,LeftSpace,LeftTilde,LtGt,Percent,Pipe,Question,RightCrLfHtVt,RightPeriod,RightSpace,Semicolon,SingleQuote,Slash,SquareBracket[localdest3]type = localencoding = Asterisk,BackQuote,BackSlash,Colon,CrLf,Ctl,Del,Dollar,Dot,DoubleQuote,Hash,InvalidUtf8,LeftCrLfHtVt,LeftPeriod,LeftSpace,LeftTilde,LtGt,Percent,Pipe,Question,RightCrLfHtVt,RightPeriod,RightSpace,Semicolon,SingleQuote,Slash,SquareBracket################################```
---
The rclone command which is failing:
Bash:
``` rclone check "$src" "$dest" \ --checkers="$checkers" \ --fast-list \ --multi-thread-streams=0 \ --buffer-size=0 \ --one-way \ --checksum \ --log-file="$LOG_FILE" \ --log-level=DEBUG \ --retries 3 \ --retries-sleep 3s \ --progress```
---
Here are some logs in pastebin:
The log files was more than 90MB, I had to cut out the mundane output from the sync and check to fit the log in pastebin's limits.
But yeah kind of just cuts of while the check is going, the last part of the log is genuine and has not been cut down to fit in pastebin.
This log is a bit more different as it has some weird symbols in the end:
This is the latest log from today, for some reason it's not showing in full in pastebin unless you click on raw to view it in it's entirety:
The log files from the rclone sync or rclone check commands don't have anything which seems to point to any issue.
When the crash happens, the log file just stops at the point of the file which it was currently processing.
---
I did a SMART test on the SSD and all seems good:
Bash:
```sudo smartctl -H /dev/sdc[sudo] password for usertemp:smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.8.0-40-generic] (local build)Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org=== START OF READ SMART DATA SECTION ===SMART overall-health self-assessment test result: PASSED```
Bash:
```sudo smartctl -l selftest /dev/sdbsmartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.8.0-40-generic] (local build)Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org=== START OF READ SMART DATA SECTION ===SMART Self-test log structure revision number 1Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error# 1 Short offline Completed without error 00% 121 -```
---
The setup:
ProxMox PVE 8.2.4
All sources are locally mounted TrueNas Scale SMB shares.
TrueNas Scale is also another VM running on the same ProxMox PVE, as is the Ubuntu Server running Rclone.
The TrueNas Scale VM exhibits no sign of any load before, after and ruing when the crash happens, just seems to be chugging along not feeling stressed.
The HDDs and SSD are all directly connected to the computer via a SAS controller. The SAS controller is exclusively given to the VM running Ubuntu Server and I see no errors on this side.
Some of the shares have normal or larger files, others have loads of small files.
Code:
Ubuntu Server VM:- 8 Cores- 8GB RAMTrueNas Scale VM:- 4 Cores- 22GB RAM- 2 x 6TB NAS HDDs 5400RPM in RAID1- 1 x 1TB SSD for read cache
---
I only added the
Bash:
`--bwlimit`
to try and see if it was not too high by default causing some kind of a load, but I will remove it now, leaving it to default.
I am currently only trying to run the rclone sync after which the rclone check commands on the SSD drive, but it fails every time.
In the logs I can see that the rclone sync completes ok, but then on the rclone check is where it freezes up, even with just 1 checker.
It seems like it's not rclone that is freezing the OS, but something related to I/O reading/writing large amounts of data, but I don't know how to debug it.
I even tried using
Bash:
`ionice -c2 -n7 nice -n 10 rclone check`
to reduce the priority of the rclone command to try and throttle the read/write ops, but no joy.
When the crash of the Ubuntu Server VM happens it becomes absolutely non responsive, the guest agent stops running, I cannot SSH into the machine. I also cannot shutdown the machine. Only issuing `STOP` to the VM works in shutting it down, or shutting down the whole ProxMox PVE node, to try and do a more graceful of the VM itself.
I have attached some screenshots of the VMs monitoring.
I just want it to complete successfully.
I am tearing out my hair trying to figure out what am I doing wrong.
Any help in debugging and fixing this is appreciated!