We should have a csv file validation tool or API to validate data before it gets loaded

Related products: None

I think we should have csv file validation before data load in S3 or data load API or through COM. We are seeing lot of issues and it is tough to debug once the data is loaded or if the file is too large. For instance consider a csv with char encoding as ANSI and in the S3 job configuration we configured it as UTF 8. In this scenario it doesn't throw any specific error, but the data load succeeds few times and fails for the most part. But if we have csv validation before hand, we could avoid this and can show proper error message.





Please find below, the use cases for which we need validation


1) Data type mismatch


2) Field name mismatch


3) Char encoding mismatch


4) Blank rows


5) If possible, we need to check for single double quotes between lines





Kindly let me know for any further information.





Thanks,


Phanindra
Phanindra - I agree on the charset level. Are you saying that there is NO validation for Data type & Field name mismatch?
For data type mismatch, yes, we are validating and returning failed records with an error message that we are unable to parse data. For field name mismatch, we are throwing "Unknown exception".